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PROTEIN-PROTEIN INTERACTIONS AND METHODS FOR IDENTIFYING 
INTERACTING PROTEINS AND THE AMINO ACID SEQUENCE AT THE SITE OF 
INTERACTION 

FIELD OF THE INVENTION 

The present invention relates to proteonomics. More specifically, the invention relates to 
protein-protein interactions and methods for identifying interacting proteins and the amino acid 
sequence at the site of interaction. 

BACKGROUND OF THE INVENTION 

Specific protein-protein interactions are critical events in biological processes. Protein- 
protein interactions govern biological processes that handle cellular information flow and control 
cellular decisions (e.g., signal transduction, cell cycle regulation and assembly of cellular 
structures). The entire network of interactions between cellular proteins is a biological chart of 
functional events that regulate the internal working of living organisms and their responses to 
external signals. A necessary step for the completion of this biological interaction chart is the 
knowledge of all the gene sequences in a given living organism. The entire DNA sequence of the 
kflpmo sapiens genome will be completed at the latest by the year 2003 (H29). Unfortunately, 
the sequence of a gene does not reveal its biological function nor its position in the biological 
chart. Given the expected number of proteins in the human genome (80,000 to 120,000), the 
mapping of the biological chart of protein-protein interactions will be an enormous but a 
rewarding task. 

During the past few decades, several techniques have been developed 
to determine the interactions between proteins (for review, see ££2Q)). These techniques include, 
i) physical methods to select and detect interacting proteins fte.g., protein affinity 
chromatography, Ge£fi-immunoprecipitation, crosslinking, and affinity blotting!!, ii) Library 
based methods \Le.g., Phage display and two-hybrid systems}!; and iii) genetic methods {fje.g., 
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overproduction phenotype, synthetic lethal effects and unlinked noncomplementationji. Of the 
above mentioned methods for detecting protein-protein interactions, the two-hybrid systems are 
most popular and are most extensively used. In the classical two-hybrid system 
transcription of reporter genes depends on an interaction between a DNA-bound "bait" protein 
and an activation-domain containing "prey" protein. The two hybrid systems unfortunately may 
suffer from a number of disadvantages. For example, the interaction of proteins is monitored in 
the nuclear milieu rather than the cytoplasm where most proteins are found and it does not allow 
the simultaneous identification of the precise amino acid sequences between two interacting 
proteins and cannot be easily applied to different cell types or tissues whereby different 
interacting proteins may be expressed. 

It has been previously demonstrated that small synthetic peptides can 
bind to proteins (1, 1§£, 42££, ©102). Nevertheless, the use of synthetic peptides in a systematic 
approach to identify interacting protein domains and sequences has not been proposed or 
provided. Certain signature domains have been shown to bind with high affinity to specific 
peptide sequences (e.g., the Src homology-2 or SH2 domain of Src-family kinases bind tightly to 
a phosphorylated tyrosine (Y*-EEI) sequence found in epidermal growth factor receptor and the 
focal adhesion kinase) (461). 

There thus remains a need to provide a method which enables 
identification of i) the exact amino acid sequences efatoLal least one binding partner between 
interacting proteins; ii) numerous, possibly all interacting proteins in different cells or tissues; 
and iii) the specific domains (or sequences) between two interacting proteins as targets for 
isolation of lead drugs. In addition, there remains a need to provide methods and assays which 
enable the identification of the precise amino acid sequence of interacting domains of proteins 
which is significantly faster than conventional methods {{.e.g., days instead of months). 

The present invention seeks to meet these and other needs. 

The present description refers to a number of documents, the content 
of which is herein incorporated by reference, in their entirety. 
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SUMMARY OF THE INVENTION 

The present invention seeks to overcome the drawbacks of the prior 
art. More specifically, the invention concerns an approach to identify protein-protein interaction 
domains which differ from the prior art. Moreover, one approach of the present invention is 
based on an understanding of the principle that governs protein-protein interactions. Such 
understanding therefore, allows the use of several methods. Such a method is exemplified in 
detail below to identify: i) at least one of the exact amino acid sequences between interacting 
proteins; ii) a number of, possibly all* interacting proteins in different cells or tissues; and iii) the 
specific domains (or sequences) between two interacting proteins as targets for isolation of lead 
drugs. Preferably, the method and assay of the present invention enables a determination of i), 
ii) and iii). Moreover, unlike the approaches of the prior art, the method described herein T allows 
for the identification of interacting proteins and the precise amino acid sequences of interactions 
in several days as opposed to several months. 

The ability to select proteins (or other molecules) that block 
interactions between a gene product and some partners but not others, should allow sophisticated 
modulation of cellular signaling or cell metabolism in human cells and other currently intractable 
systems. Indeed, the identification of proteins that interact with a therapeutically important 
protein and the identification of the sites of interaction may be more relevant to drug 
development than other genetic approaches such as "knock-outs" Q1&). The latter addresses the 
phenotypic consequences of disrupting all of the interactions in which a given protein is involved 
as opposed to inhibiting the interaction of one protein (at worse of a few proteins as opposed to 
all) in a multimeric complex. 

The present invention further relates to a novel approach in drug 
discovery. AmajofA major obstacle in drug development for the treatment of diseases has been 
the identification of target proteins and their functional sites. In fact, most research and 
development (R&D) projects in pharmaceutical companies take several years to identify a valid 
target protein. The selection of drugs that bind to and inhibit the functions of these proteins takes 
several years and is generally non-specific and random. Furthermore, drugs identified by current 
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approaches often target the active sites in proteins. Such drugs thus often lead to major side- 
effects. Therefore, it is not surprising that many R&D projects never lead to the development of 
specific drugs even after three to five years of intensive research efforts. The methods and assays 
to identify protein-protein interactions of the present invention may address three important steps 
in the development of drugs: 

1) the identification of the amino acid sequences of all interacting 
domains in target proteins; 

2) the identification of a set of interacting proteins (preferably all 
interacting proteins) for drug development; and 

3) screening for specific drugs against each of the interacting 
domains in a target protein. 

P-glycoprotein (P-gp) has been shown to cause multidrug resistance in 
tumor cell lines selected with lipophilic anticancer drugs. Analysis of P-gp amino acid sequence 
has lead to a proposed model of a duplicated molecule with two hydrophobic and hydrophilic 
domains linked by a highly charged region of eifGaabaul 90 amino acids, the linker domain. 
Although similarly charged domains are found in other members of the P-gp superfamily, the 
function(s) of this domain are not known. Herein, it is demonstrated using the method of the 
present invention that this domain binds to other cellular proteins. Using overlapping 
hexapeptides that span the entire amino acid sequences of the linker domains of human P- 
glycoprotein gene 1 and 3 (HP-gpl and HP-gp3), a direct and specific binding between PHE-gpl 
and 3 linker domains and intracellular proteins is shown herein. Three different stretches 
( 617 EKGIYFKLVTM 627 fSEO TP NO: IV 658 SRSSLIRKRSTRRSVRGSQA 677 fSTCO TP NO: 2^ 
and 694 PVSFWRTMKLNLT 706 (SEP IP NO: 3^ for PHEgpl and 618 LMKKEGVYFKLVNM 631 . 
(SEQ IP NO: 4), ^KAATRMAPNGWKSRLFRHSTQKNLKNS 674 (SEQ ID NO; 5) and 
695 PVSFLKVLKLNKT 67y ~ 707 (SEP IP NO: 6^ for PflE-gp3) in linker domains specifically 
bound to proteins with apparent molecular masses of -80 kDa, 57 kDa and 30 kDa. 
Interestingly, only the 57 kDa protein was bound, to varying degrees, to the three different 
sequences in the linker domain. Moreover, the binding between the overlapping peptides 
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encoding the linker sequence and the 57 kDa protein were resistant to the 2&witterionic 
detergent, CHAPS, but were sensitive to SDS. Purification and partial N-terminal amino acid 
sequencing of the 57 kDa protein showed that it encodes the N-terminal amino acids of alpha and 
beta-tubulins. Further, Western blot analysis using monoclonal antibodies that binds to a- and P- 
tubulins confirmed the identity of the 57 kDa protein. Taken together, this is the first example 
showing protein interactions with the P-gp linker domain. This may of course be important to 
the overall function of P-gp. More importantly, the results in this study demonstrate the novel 
concept whereby the interactions between two proteins are mediated by strings of few amino 
acids with high and repulsive binding energies. 

In accordance with one embodiment of the present invention, there is 
provided a method of identifying a high- affinity interacting domain in a chosen protein, domain 
thereof* or part thereof* and the amino acid sequence thereof comprising: a) providing a set of 
overlapping peptides spanning a complete sequence of the chosen protein, domain thereof* or 
part thereof, covalently bound to a support; b) providing a mixture of proteins and/or a mixture 
of peptides; £XJncubating the set of overlapping peptides of a), with the mixture of b), under 
conditions enabling the binding between a high- affinity interacting domain in a peptide of the set 
and one or more protein or peptide of b) to occur; d) washing efoff any protein-protein 
interaction which is not a high- affinity interaction of c); and e) identifying which peptide of a) 
interacts with high- affinity to a protein or peptide of b)f* thereby identifying the peptide of e) and 
the sequence thereof as a high- affinity interacting domain. 

In accordance with another embodiment of the present invention, there is 
provided a method of identifying an agent which modulates an interaction between high- affinity 
interacting domains between a set of overlapping peptides spanning a complete sequence of a 
chosen protein, domain thereof or part thereof, covalently bound to a support and a mixture of 
proteins and/or a mixture of peptides comprising: a) incubating the set of overlapping peptides, 
with the mixture in ajh£ presence of at least one agent, under conditions enabling the binding 
between a high- affinity interacting domain in a peptide of the set and one or more protein or 
peptide of the mixture to occur; b) washing efflff any protein-protein interaction which is not a 
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high- affinity interaction of b); and c) identifying which peptide of a) interacts with high- affinity 
to a protein or peptide of the mixture in a presence of the agent as compared to in an absence 
thereof; thereby identifying the agent as a modulator of the high- affinity interaction when the 
interaction in the presence of the agent is measurably different from in the absence thereof. 

In accordance with yet another embodiment of the present invention, there is 
provided agents identified as modulators of the high- affinity protein interactions of the present 
invention. 

For the purpose of the present invention, the following abbreviations and 
terms are defined below. 

DEFINITIONS 

The terminology "overlapping peptides spanning a peptide sequence" (e.g. h a domain, a 
full length protein sequence or a part thereof) or the like refers to peptides of a chosen size, based 
on the sequence of the protein (or part thereof). Preferably, these peptides are synthetic peptides. 

As explained hereinbelow, the size of the overlapping peptides has a 
significant impact on the workings of the present invention. For example, peptides of four 
contiguous amino acids appear to significantly increase the low affinity binding of proteins 
thereto. Moreover, the use of larger peptides, such as 20 amino acids or higher, would be 
expected to increase the proportion of repulsive amino acids to high affinity amino acids, thereby 
masking or totally inhibiting the binding of specific proteins to the peptides. Thus, while the 
person of ordinary skill would understand that there are trade-offs associated with the choice of 
small peptides as opposed to larger ones, the preferred size for the overlapping peptides of the 
present invention is between 5 and 15 amino acids, more preferably between 5 and 12, and 
especially preferably between 5 and 10 amino acids. 

The term "support" in the context of a support to which the 
overlapping peptides of the present invention are covalently bound, can be chosen from a 
multitude of supports found in the art. Such supports include CHIPS, plates (e.g. 96-well plates), 
glass beads and the like). The CHIP technology is well-known in the art . Roforonoo re l at i ng 
thoroto inc l udo Dobouok ot al., Nat Gonot. 1000 Jan;21 (1 Supp l ): 48 50, Rov i ow; Brown ot a l ., 
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Nat Gonot.1000 Jan;21 (1 Supp l ):33 7, Roviow; Ghoung ot a l ., Nat Gonot. 1000 Jan;21(1 
Supp l ):15 0, Rov i ow; Duggan ot al., Nat Gonot. 1000 Jan;21(1 Suppl):10 4, Rov i ow; Schona ot 
a l ., Tronds Biotochno l . 1008 Ju l ;16(7):301 6, Rov i ow; and Ramcay ot a l ., Nat B i otochno l . 1008 
Jan;16(1):40 1, Rov i ow QQ , 19, 24, 26, 85, 97) . 

Protein sequences are presented herein using the one letter or three 
letter amino acid symbols as commonly used in the art and in accordance with the 
recommendations of the IUPAC-IUB Biochemical Nomenclature Commission. 

Unless defined otherwise, the scientific and technological terms and 
nomenclature used herein have the same meaning as commonly understood by a person of 
ordinary skill to which this invention pertains. Generally, the procedures for cell cultures, 
infection, molecular biology methods and the like are common methods used in the art. Such 
standard techniques can be found in reference manuals such as for e xamp l o Sambrook ot a l . 
(1980, Mo l ooular C l on i ng, A Laboratory Manua l , Co l d Spr i ng Harbor Laborator ie s) and Ausub e l 
e t a l . (199 4 , Curront Protoco l s i n Molooular B i ology, Wi l oy, Now York). U LMj* 

The present description refers mainly to proteins, ofor recombinant 
DNA (rDNA) technology terms. Selected examples are provided for clarity and consistency. 

As used herein, "nucleic acid molecule", refers to a polymer of 
nucleotides. Non-limiting examples thereof include DNA (e.g. genomic DNA, cDNA) and RNA 
molecules (e.g. mRNA). The nucleic acid molecule can be obtained by cloning techniques or 
synthesized. DNA can be double-stranded or single-stranded (coding strand or non-coding 
strand [antisense]). 

The term "recombinant DNA" as known in the art refers to a DNA 
molecule resulting from the joining of DNA segments. This is often referred to as genetic 
engineering. 

The term "DNA segment"^! is used herein, to refer to a DNA molecule 
comprising a linear stretch or sequence of nucleotides. This sequence when read in accordance 
with the genetic code, can encode a linear stretch or sequence of amino acids which can be 
referred to as a polypeptide, protein, protein fragment and the like. 
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The terminology "amplification pair" refers herein to a pair of 
oligonucleotides (oligos) of the present invention, which are selected to be used together in 
amplifying a selected nucleic acid sequence by one of a number of types of amplification 
processes, preferably a polymerase chain reaction. Other types of amplification processes include 
ligase chain reaction, strand displacement amplification, or nucleic acid sequence--based 
amplification, as explained in greater detail below. As commonly known in the art, the oligos are 
designed to bind to a complementary sequence under selected conditions. 

The nucleic acid (e.g. DNA or RNA) for practicing the present 
invention may be obtained according to well known methods. 

As used herein, the term "physiologically relevant" is meant to 
describe interactions which can take effect to modulate an activity or level of one or more 
proteins in their natural setting. 

The term "DNA" molecule or sequence (as well as sometimes the term 
"oligonucleotide") refers to a molecule comprised of the deoxyribonucleotides adenine (A), 
guanine (G), thymine (T) and/or cytosine (C), in a double-stranded form, and comprises or 
includes a "regulatory element" according to the present invention, as the term is defined herein. 
The term "oligonucleotide" or "DNA" can be found in linear DNA molecules or fragments, 
viruses, plasmids, vectors, chromosomes or synthetically derived DNA. As used herein, 
particular double-stranded DNA sequences may be described according to the normal convention 
of giving only the sequence in the 5' to 3' direction. 

"Nucleic acid hybridization" refers generally to the hybridization of 
two single— stranded nucleic acid molecules having complementary base sequences, which under 
appropriate conditions will form a thermodynamically favored double— stranded structure. 
Examples of hybridization conditions can be found in the two laboratory manuals referred above 
( Sambrook ot a l A 198 9 , eupra and Ausubo l ot a l ., 1 9 89, GupraQ &) and are commonly known in 
the art. In the case of a-hybridization to a nitrocellulose filter, as for example in the well known 
Southern blotting procedure, a nitrocellulose filter can be incubated overnight at 65°C with a 
labeled probe in a solution containing 50% formamide, high salt (5 x SSC or 5 x SSPE), 5 x 
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Denhardt's solution, 1% SDS, and 100 ng/ml denatured carrier DNA (e.g. h salmon sperm DNA). 
The non-specifically binding probe can then be washed off the filter by several washes in 0.2 x 
SSC/0.1% SDS at a temperature which is selected in view of the desired stringency: room 
temperature (low stringency), 42°C (moderate stringency) or 65°C (high stringency). The 
selected temperature is based on the melting temperature (Tm) of the DNA hybrid. Of course, 
RNA-DNA hybrids can also be formed and detected. In such cases, the conditions of 
hybridization and washing can be adapted according to well Jaiown methods by the person of 
ordinary skill. Stringent conditions will be preferably used ( Sambrook ot a l ., 1989, eupra). £ £L 

Probes for nucleic acids can be utilized with naturally occurring 
sugar-phosphate backbones as well as modified backbones including phosphorothioates, 
dithionates, alkyl phosphonates and a-^ucleotides and the like. Modified sugar-phosphate 
backbones are generally taught by M ille r, 1988; Ann. Roports Mod. Chom. 23:295 and Moran e t 
a l ., 1987, Nudoio Ac i ds Roc, 1 4 :501 9£ Z2<JE&)- Probes of the invention can be constructed of 
either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA. 

It is an advantage of the present invention that the detection of the 
interaction between proteins and/or peptides be dependent on a label. Such labels provide 
sensitivity and often enable automation. In one embodiment of the present invention, automation 
is performed using CHIP technology. For example, the overlapping peptides spanning a chosen 
sequence of a protein, are bound to a CHIP which can then be used to automate a testing for 
interaction with proteins or peptides. Of course, it should be understood that the present 
invention is not strictly dependent on a design and synthesis of the overlapping set of peptides 
spanning a chosen protein sequence. Indeed, banks of peptides are available, from which this set 
of overlapping peptides could be constructed. 

Protein labelling is well- known in the art.Tft-neft„^Qfl-limiting examples of labels 
includes 3 H, 14 C, 32 P, and 35 S. Non-Jimiting examples of detectable markers include ligands, 
fluorophores, chemiluminescent agents, enzymes, and antibodies. It will become evident to the 
person of ordinary skill that the choice of a particular label dictates the manner in which it is 
bound to the protein. 
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The identification of the interaction is not specifically dependent on labelling of the 
proteins, since for example, this interaction could be assessed using proteomic approaches (such 
as 2JD gels and mass spectrometry) or using a library of antibodies. 

As commonly known, radioactive nuG l oot i do s amino acids can be 
incorporated into peptides or proteins of the invention by several well-known methods. A non- 
limiting example thereof includes in vitro or in vivo labelling of proteins using 35 SMet. 

The term "vector" is commonly known in the art and defines a plasmid DNA, 
phage DNA, viral DNA and the like, which can serve as a DNA vehicle into which DNA of the 
present invention can be cloned. Numerous types of vectors exist and are well known in the art. 

The term "expression" defines the process by which a gene is transcribed into 
mRNA (transcription), the mRNA ie-then being translated (translation) into one polypeptide (or 
protein) or more. 

The terminology "expression vector" defines a vector or vehicle as described 
above A but designed to enable the expression of an inserted sequence following transformation 
into a host. The cloned gene (inserted sequence) is usually placed under the control of control 
element sequences such as promoter sequences. The placing of a cloned gene under such control 
sequences is often referred to as being operably linked to control elements or sequences. 

Operably linked sequences may also include two segments that are transcribed 
efttejnta the same RNA transcript. Thus, two sequences, such as a promoter and a "reporter 
sequence" are operably linked if transcription commencing in the promoter will produce an RNA 
transcript of the reporter sequence. In order to be "operably linked" it is not necessary that two 
sequences be immediately adjacent to one another. 

Expression control sequences will vary depending on whether the vector is 
designed to express the operably linked gene in a prokaryotic or eukaryotic host or both (shuttle 
vectors) and can additionally contain transcriptional elements such as enhancer elements, 
termination sequences, tissue-specificity elements, and/or translational initiation and termination 
sites. 
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Prokaryotic expressions-afeJs useful for the preparation of large quantities of 
the protein encoded by the DNA sequence of interest- This protein can be purified according to 
standard protocols that take advantage of the intrinsic properties thereof, such as size and charge 
(e.g.,. SDS gel electrophoresis, gel filtration, centrifugation, ion exchange chromatography^!*^). 
In addition, the protein of interest can be purified via affinity chromatography using polyclonal or 
monoclonal antibodies. The purified protein can be used for therapeutic applications. 

The DNA construct can be a vector comprising a promoter that is operably 
linked to an oligonucleotide sequence of the present invention, which ie-in turnjg operably linked 
to a heterologous gene, such as the gene for the luciferase reporter molecule. "Promoter" refers 
to a DNA regulatory region capable of binding directly or indirectly to RNA polymerase in a cell 
and initiating transcription of a downstream (3' direction) coding sequence. For purposes of the 
present invention, the promoter is bound at its 3' terminus by the transcription initiation site and 
extends upstream (5* direction) to include the minimum number of bases or elements necessary to 
initiate transcription at levels detectable above background. Within the promoter will be found a 
transcription initiation site (conveniently defined by mapping with SI nuclease), as well as 
protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. 
Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CCAT" boxes. 
Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the -10 and -35 
consensus sequences. 

As used herein, the designation "functional derivative" denotes, in the context 
of a functional derivative of a sequence* whether ana nucleic acid or amino acid sequence, a 
molecule that retains a biological activity (either function or structural) that is substantially 
similar to that of the original sequence. This functional derivative or equivalent may be a natural 
derivative or may be prepared synthetically. Such derivatives include amino acid sequences 
having substitutions, deletions, or additions of one or more amino acids, provided that the 
biological activity of the protein is conserved. The same applies to derivatives of nucleic acid 
sequences which can have substitutions, deletions, or additions of one or more nucleotides, 
provided that the biological activity of the sequence is generally maintained. When relating to a 
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protein sequence, the substituting amino acid aehas chemico-physical properties* which are 
similar to that of the substituted amino acid. The similar chemico-physical properties include, 
similarities in charge, bulkiness, hydrophobicity, hydrophyilicity and the like. The term 
"functional derivatives" is intended to include -fragments", -segments", -variants", -analogs- or 
Chemical derivatives- of the subject matter of the present invention. 

As well-known in the art, a ^conservative mutation or substitution^ of an 
amino acid refers to mutation or substitution which maintains,; 1) the structure of the backbone 
of the polypeptide (e.g. a beta sheet or alpha-helical structure); 2) the charge or hydrophobicity of 
the amino acid; or 3) the bulkiness of the side chain. More specifically, the well-known 
terminologies "hydrophilic residues" relate to serine or threonine. "Hydrophobic residues" refer 
to leucine, isoleucine, phenylalanine, valine or alanine. "Positively charged residues" relate to 
lysine, arginine or hyistidine. Negatively charged residues" refer to aspartic acid or glutamic 
acid. Residues having "bulky side chains" refer to phenylalanine, tryptophan or tyrosine. 

Peptides, protein fragments, and the like in accordance with the present 
invention can be modified in accordance with well-known methods dependently or independently 
of the sequence thereof. For example, peptides can be derived from the wild-type sequence 
exemplified herein in the figures using conservative amino acid substitutions at 1, 2, 3 or more 
positions. The terminology "conservative amino acid substitutions" is well- known in the art* 
which relates to substitution of a particular amino acid by one having a similar characteristic (e.g. 
aspartic acid for glutamic acid, or isoleucine for leucine). Of course, non-conservative amino 
acid substitutions can also be carried out, as well as other types of modifications such as 
deletions or insertions, provided that these modifications modify the peptide, in a suitable way 
(e.g. without affecting the biological activity of the peptide if this is what is intended by the 
modification). A list of exemplary conservative amino acid substitutions is given hereinbelow. 
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TABLE 2 



CONSERVA11VE AMIKO ACID REPLACEMENTS 



For Amino Acid Code Replace With 



Alanine 
Arginine 

Asparagine 

Aspartk Acid 

Cysteine 

Ghitamme 

Glutamic Acid 

Glycine 

Isolcucine 

Leucine 

Lysine 

Methionine 

Phenylalanine 



Proline 

Serine 

Threonine 

Tyrosine 
Valine 



A 
R 

N 
D 

C 

Q 
E 
G 
I 



K 



M 



Y 
V 



D-Aku Gty, Aib, fJ-Ala, Acp, L-Cys, D-Cys 
D-Aig, Lys, D-Lys, homo-Arg, D-homo-Aig, 
Met, Be, D-Met, D-Ue, Ora, D-Orn 
D-Asn, Asp, D-Asp, Glu, D-Glu, Gin, D-Gto 
D-Asp, D-Asn, Am, Glu, D-GIu, Gin, D-Gln 
D-Cys, S-Me-Cys, Met, D-Met, Hit, D-Tlir 
EKJtn, Asa D-Ascu Glu, D-Glu, Asp, D-Asp 
D-Ghi, D-Asp, Asp, Asa, D-Asiu Gin, D-Gln 
Ala, D-Ala, Pro, D-Pro, Aib, P-Ala, Acp 
DJJe, VsO, D-Vkl, AdaA, AdaG, Leu, D-Leu, 
Met, D*Met 

D-Lcu, Val, D-VaU AdaA, AdaG, Leu, D-Leu, 
Met, D-Met 

Diy s, Arg, D-Aig, homo- Arg, D-homo-Arg, 
Met, D-Met! Ik, D-Ile, Orn, D-Qm 
D-Met, S-Me-Cys, He, D4te, Leu, D-Leu, 

\fetD-\ki 

D-Pbe, Tyr, D-Thr, L-Dopa, His, D-ffis, 
Tip, D-Tip, or 5-phenylproline, 

AdaA, AdaG, cis-3,4, or 5*phenylpolioe, 
Bpa, D-Bpa 

D-Pro, I^I-thioazoIkiine-4^arboxyLic 
acid, D-or L- 1 K>xazolidinc-4-carbosy lac 
acid (Kauet, U.S. Pat No. (4,311390) 
D-Ser, Ha; D-Thr, aUo-Thr, Met, D-Met, 
Met(OX D-Met(0), LCys, D-Cys 
D-Thr, Sejr, D-Ser, allo-Thr, Met, D-Met, 
MetfO), D-Met(0), Val, D-\W 
D-Tyn Phe, D-Phe, L-Dopa, His, D»His 
DAW, Leu, D-Leu, Be, D-De, Met, D-Met, 
AdaA, AdaG 
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As can be seen in this table, some of these modifications can be used to render 
the peptide more resistant to proteolysis. Of course, modifications of the peptides can also be 
effected without affecting the primary sequence thereof using an_enzymatic or chemical 
treatment ae-well- known in the art. 

Thus, th e j hg term "variant" refers herein to a protein or nucleic acid 
molecule a which is substantially similar in structure and biological activity to the protein or 
nucleic acid of the present invention. 

The functional derivatives of the present invention can be synthesized 
chemically or produced through recombinant DNA technology, a ll thosoa sing methods afe-well 
known in the art. In one particular embodiment of the present invention, a variant according to 
the present invention can be identified wtthusing a method of the present invention. It can also 
be designed to formally test for the conservation of particular amino acids (e.g. by synthesizing a 
variant or mutant peptide). These variants can also be tested as part of the full length sequence 
of the protein in order to validate the interaction. Of course, the skilled artisan will understand 
that having identified a region of a chosen protein as a region which is involved in high- affinity 
protein interaction(s) enables an in vitro mutagenesis (or a testing of related peptide sequences) 
of this region to identify and dissect the structure/function relation of this region. Such methods 
are well- known in the art. Th e When the interaction domains of— 2 two proteins having been 
identified, it is thus possible for the skilled artisan to identify and/or design variants having a 
modified affinity for an interacting protein. Of course, when both interacting sequences are 
known, very powerful questions can be asked to dissect the structure-function relationships which 
governs the high- affinity interaction between same. 

As used herein, "chemical derivatives" is meant to cover additional chemical 
moieties not normally part of the subject matter of the invention. Such moieties could affect the 
physico-chemical characteristic of the derivative (e.g. solubility, absorption, half life and the like, 
decrease of toxicity). Such moieties are exemplified in Remington's Pharmaceutical Sciences 
( o.q.. 1080 88V Methods of coupling these chemical-physical moieties to a polypeptide are well 
known in the art. 
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The term "allele" defines an alternative form of a gene* which occupies a 
given locus on a chromosome. 

As commonly known, a "mutation" is a detectable change in the genetic 
material* which can be transmitted to a daughter cell. As well known, a mutation can be, for 
example, a detectable change in one or more deoxyribonucleotide. For example, nucleotides can 
be added, deleted, substituted for, inverted, or transposed to a new position. Spontaneous 
mutations and experimentally induced mutations exist. The result of a mutations ofji nucleic 
acid molecule is a mutant nucleic acid molecule. A mutant polypeptide can be encoded from this 
mutant nucleic acid molecule. 

As used herein, the term "purified" refers to a molecule having been separated 
from a cellular component. Thus, for example, a "purified protein" has been purified to a level 
not found in nature. A "substantially pure" molecule is a molecule that is lacking in most other 
cellular components. 

As used herein, the terms "molecule", "compound" or "ligand" are used 
interchangeably and broadly to refer to natural, synthetic or semi-synthetic molecules or 
compounds. The term "molecule" therefore denotes for example chemicals, macromolecules, 
cell or tissue extracts (from plants or animals) and the like. Non-limiting examples of molecules 
include nucleic acid molecules, peptides, antibodies, carbohydrates and pharmaceutical agents. 
The agents can be selected and screened by a variety of means including random screening, 
rational selection and by rational design using for example protein or ligand modelling methods 
such as computer modelling, combinatorial library screening and the like. The terms "rationally 
selected" or "rationally designed" are meant to define compounds* which have been chosen based 
on the configuration of the interaction domains of the present invention. As will be understood 
by the person of ordinary skill, macromolecules having non-naturally occurring modifications are 
also within the scope of the term "molecule". For example, peptidomimetics, well known in the 
pharmaceutical industry and generally referred to as peptide analogs* can be generated by 
modelling as mentioned above. Similarly, in a preferred embodiment, the polypeptides of the 
present invention are modified to enhance their stability. It should be understood that in most 
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cases this modification should not alter the biological activity of the interaction domain. The 
molecules identified in accordance with the teachings of the present invention have a therapeutic 
value in diseases or conditions in which the physiology or homeostasis of the cell and/or tissue is 
compromised by a high- affinity protein interaction identified in accordance with the present 
invention. Alternatively, the molecules identified in accordance with the teachings of the present 
invention find utility in the development of more efficient agents* which can modulate such 
interactions. 

Libraries of compounds (publicly available or commercially available, e.g.* a 
combinatorial library) are well- known in the art. Libraries of peptides are also available. Such 
libraries can be used to build an overlapping set of peptide sequences spanning a chosen domain, 
protein or part thereof. 

As used herein* the recitation "indicator cells" refers to cells that express, in 
one particular embodiment, two interacting peptide domains of the present invention, and 
wherein an interaction between these proteins or interacting domains thereof is coupled to an 
identifiable or selectable phenotype or characteristic such that it provides an assessment or 
validation of the interaction between same. Such indicator cells can also be used in the screening 
assays of the present invention. In certain embodiments, the indicator cells have been engineered 
so as to express a chosen derivative, fragment, homolog, or mutant of these interacting domains. 
The cells can be yeast cells or higher eukaryotic cells such as mammalian cells (WO 96/41169). 
In one particular embodiment, the indicator cell is a yeast cell harboring vectors enabling the use 
of the two hybrid system technology, as well known in the art ( Ausub el e t al., 199 4 , euprag) and 
can be used to test a compound or a library thereof. In one embodiment, a reporter gene 
encoding a selectable marker or an assayable protein can be operably linked to a control element 
such that expression of the selectable marker or assayable protein is dependent on the interaction 
of the Protein A and Protein B interacting domains. Such an indicator cell could be used to 
rapidly screen at high-throughput a vast array of test molecules. In a particular embodiment, the 
reporter gene is luciferase or p-Gal. 



BOSTON 1283096v42 



PATENTS 
A«ftrnuPftfMNQ>H241ff-124 

In one embodiment, at least one of the two interacting proteins or domains of 
the present invention may be provided as a fusion protein. The design of constructs therefor and 
the expression and production of fusion proteins are well known in the art ( Sambrook ot a l ., 
198 9 , Gupra\ and Ausub el e t a l ., 1 9 9 4 , Gupro^ M). In a particular embodiment, both interaction 
domains are part of fusion proteins. A non-limiting example of such fusion proteins includes a 
LexA-Protein A fusion (DNA-binding domain-Protein A; bait) and a B42-Protein B fusion 
(transactivator domain-Protein B; prey). In yet another particular embodiment, the LexA-Protein 
A and B42-Protein B fusion proteins are expressed in a yeast cell also harboring a reporter gene 
operably linked to a LexA operator and/or LexA responsive element. Of course, it will be 
recognized that other fusion proteins can be used in such 3two hybrid systems. Furthermore, it 
will be recognized that the fusion proteins need not contain the full-length interacting proteins. 
Indeed, fragments of these polypeptides, provided that they comprise the interacting domains, 
can be used in accordance with the present invention, as evidenced with the peptide spanning 
method of the present invention. 

Non-limiting examples of such fusion proteins include a- 
homaa l ut i n i n hemagplutinin fusions, Ggluthione-S-transferase (GST) fusions and Mmaltose 
binding protein (MBP) fusions. In certain embodiments, it might be beneficial to introduce a 
protease cleavage site between the two polypeptide sequences which have been fused. Such 
protease cleavage sites between two heterologously fused polypeptides are well known in the art. 

In certain embodiments, it might also be beneficial to fuse the interaction 
domains of the present invention to signal peptide sequences enabling a secretion of the fusion 
protein from the host cell. Signal peptides from diverse organisms are well known in the art. 
Bacterial OmpA and yeast Suc2 are two non limiting examples of proteins containing signal 
sequences. In certain embodiments, it might also be beneficial to introduce a linker (commonly 
known) between the interaction domain and the heterologous polypeptide portion. Such fusion 
protein find utility in the assays of the present invention as well as for purification purposes, 
detection purposes and the like. 
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For certainty, the sequences and polypeptides useful to practice the invention 
include a without being limited thereto* mutants, homologs, subtypes, alleles and the like. It shall 
be understood that generally, the sequences of the present invention should encode a functional 
(albeit defective) domain. It will be clear to the person of ordinary skill that whether an 
interaction domain of tho prosont i nv e nt i on , variant, derivative* or fragment thereo f, of the 
present invention retains its function in binding to its partner can be readily determined by using 
the teachings and assays of the present invention and the general teachings of the art. 

As exemplified herein below, the interaction domains of the present invention 
can be modified, for example by in vitro mutagenesis, to dissect the structure-function 
relationship thereof and permit a better design and identification of modulating compounds. 
However, some derivative or analogs having lost their biological function of interacting with 
their respective interaction partner may still find utility, for example for raising antibodies. Such 
analogs or derivatives could be used for example to raise antibodies to the interaction domains of 
the present invention. These antibodies could be used for detection or purification purposes. In 
addition, these antibodies could also act as competitive or non-competitive inhibitors and be 
found to be modulators of an interaction identified in accordance with the present invention. 

A host cell or indicator cell has been "transfected" by exogenous or 
heterologous DNA (e.g. a DNA construct) when such DNA has been introduced inside the cell. 
The transfecting DNA may or may not be integrated (covalently linked) into chromosomal DNA 
making up the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the 
transfecting DNA may be maintained on a episomal element such as a plasmid. With respect to 
eukaryotic cells, a stably transfected cell is one in which the transfecting DNA has become 
integrated into a chromosome so that it is inherited by daughter cells through chromosome 
replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell 
lines or clones comprised of a population of daughter cells containing the transfecting DNA. 
Transfection methods are well known in the art ( Sambrook ot a l A 1080, supra; Ausubo l ot al., 
100 4 Guprc$ £). The use of a mammalian cell as indicator can provide the advantage of 
furnishing an intermediate factor, which permits or modulates the interaction of two polypeptides 
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which are tested, that might not be present in lower eukaryotes or prokaryotes. Of course, an 
advantage might be rendered moot if both polypeptides tested directly interact. It will be 
understood that extracts from mammalian cells for example could be used in certain 
embodiments, to compensate for the lack of certain factors in a chosen indicator cell It shall be 
realized that the field of translation provides ample teachings of methods to prepare and 
reconstitute different types of extracts. 

In general, techniques for preparing antibodies (including monoclonal 
antibodies and hybridomas) and for detecting antigens using antibodies are well known in the art 
( Campb ell , 1984, I n "Monoc l ona l Antibody Toohno l ogy: Laboratory Toohn i quos i n Bioohom i stry 
and Moloou l ar Bio l ogy", E l sov i or Sci e nc e Pub li sher, Amst e rdam, Th e Nothor l ands) and i n 
Har l ow ot a l .. 1088 ( i n: Antibody A Laboratory Manua l , CSH Laborator ie sJ L2). The present 
invention also provides polyclonal, monoclonal antibodies, or humanized versions thereof, 
chimeric antibodies and the like which inhibit or neutralize their respective interaction domains 
and/or are specific thereto. 

From the specification and appended claims, the term ^therapeutic agen£ 
should be taken in a broad sense so as to also include a combination of at least two such 
therapeutic agents. Furthor, thoJ M DNA segments or proteins according to the present 
invention can be introduced into individuals in a number of ways. For example, erythropoietic 
cells can be isolated from the afflicted individual, transformed with a DNA construct according 
to the invention and reintroduced to the afflicted individual in a number of ways, including 
intravenous injection. Alternatively, the DNA construct can be administered directly to the 
afflicted individual, for example, by injection in the bone marrow. The therapeutic agent can 
also be delivered through a vehicle such as a liposome, which can be designed to be targeted to a 
specific cell type, and engineered to be administered through different routes. 

For administration to humans, the prescribing medical professional will 
ultimately determine the appropriate form and dosage for a given patient, and this can be 
expected to vary according to the chosen therapeutic regimen (e.g. DNA construct, protein, 
molecule), the response and condition of the patient as well as the severity of the disease. 
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Compositions within the scope of the present invention should contain the 
active agent (e.g. protein, nucleic acid, or molecule) in an amount effective to achieve the desired 
therapeutic effect while avoiding adverse side effects. Typically, the nucleic acids in accordance 
with the present invention can be administered to mammals (e.g. humans) in doses ranging from 
0.005 to 1 mg per kg of body weight per day of the mammal which is treated. Pharmaceutically 
acceptable preparations and salts of the active agent are within the scope of the present invention 
and are well known in the art ( R e mington's Pharmaceut i ca l Sci e nc e , 16* Ed., Mack Ed.g g)- For 
the administration of polypeptides, antagonists, agonists and the like, the amount administered 
should be chosen so as to avoid adverse side effects. The dosage will be adapted by the clinician 
in accordance with conventional factors such as the extent of the disease and different parameters 
from the patient. Typically, 0.001 to 50 mg/kg/day will be administered to the mammal. 

The methods and assays of the present invention have also been validated 
with Annexin. This protein is significantly different from P-glycoprotein in both structure and 
function. Consequently, together with the knowledge of protein chemistry and molecular 
biology, these validations support the utility of the instant assays and methods for all proteins 
(from viruses, living cells, animals, plants, etc.) 

BRIEF DESCRIPTION OF THE DRAWINGS 

Having thus generally described the invention, reference will now be made to the 
accompanying drawings, showing by way of illustration a preferred embodiment thereof, and in 
which: 

Figure 1 shows the principle of protein-protein interaction. The plus signs (+) indicate 
the regions of high- affinity binding. The minus signs (~) indicate the regions of high-repulsive 
forces. As indicated in the text, interactions between two proteins are made up of discontinuous 
regions of high- affinity binding and high-repulsive forces that are almost in equilibrium with 
high- affinity binding being more favoured while proteins are together^ 

Figure 2 is a schematic representation of a method of identification of high- 
affinity binding sequences according to one embodiment of the present invention. A, the 
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different shapes represent different proteins in a total cell lysate. The signs are like for Figure 1. 
B, small overlapping peptides that cover the entire sequence (or a segment) of protein. A will be 
synthesized directly on derivatized wells of 96-well polypropylene plates. Following peptide 
synthesis, metabolically radiolabeled total cell lysate is added to each well containing the various 
peptides and incubated in an incubator buffer. C, Tihe dark filled circles represent the 
radiolabeled proteins from total cell lysate isolated from metabolically radiolabeled cells added to 
all the wells of the 96-well plates to identify high- affinity binding sequences on Protein A. D, 
after an extensive washing, the high affinity binding sequences (overlapping peptides from 
Protein A) are in those wells that bind radiolabeled proteins (in dark). Four high- affinity binding 
sequences between Protein A and another protein(s) are identified in rows 1, 3, 6 and 8. The 
wells that contain the high affinity binding sequences are identified by radiolabeled counting and 
SDS-PAGEt a 

Figure 3 is a schematic representation of a method of identification of high- 
affinity binding sequences according to another embodiment of the present invention. A shows a 
schematic representation of the interaction between Protein A and Protein B. B, small 
overlapping peptides that cover the entire sequence (or a segment) of Protein A will be 
synthesized directly on derivatized wells of 96-well polypropylene plates. Following peptide 
synthesis, a radiolabeled Protein B (synthesized from in vitro transcription-translation reaction 
mix) are added to each well containing the various peptides and incubated in an incubation 
buffer. C, the dark filled circles represent the radiolabeled Protein B that has been added to all 
the wells of the 96-well plates to identify high- affinity binding sequences on Protein A. D, after 
a washing procedure, the high affinity binding sequences are in those wells in which Protein B 
(radiolabeled protein in dark) is still bound to the peptides from Protein A. E, four high affinity 
binding sequences between Protein A and Protein B are identified in rows 1, 3, 6 and 8. The 
wells that contain the high- affinity binding sequences are identified by radiolabeled counting and 
SDS-PAGEi* 

Figure 4 is a schematic representation of a method of selection of drugs that 
specifically inhibit the binding of protein A to B according to one embodiment of the present 
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invention. A shows a schematic representation of the interaction between Protein A and Protein 
B. B, peptides that encode high- affinity binding sequences are used as LEAD sequences for the 
selection of specific drugs that inhibit the association between Protein A and Protein B and 
ultimately the function of the complex. To target the high- affinity binding sequences that were 
identified in Figures 2 or 3, peptides encoding one of the high- affinity binding sequences are 
synthesized in every well of the 96-well plate. Grey circles represent one of four high- affinity 
binding sequences identified in Figures 2 and 3. C, following the addition of a compound to be 
tested to each well of the 96-well plate, a radiolabeled Protein B are added to each of the wells. 
Of course, combinatorial libraries can be screened to identify drugs that bind specifically to the 
high- affinity binding sequences of Protein A. As previously^statetL radiolabeled Protein B from 
transcription-translation reaction mix are represented. Plates are washed and drugs that 
specifically bind to high affinity sequences of Protein A are found in those wells that do not 
contain radiolabeled Protein B. D, wells containing drugs/compounds that bind specifically to 
one of the high- affinity binding sequence in Protein A and therefore prevent the binding of 
Protein B are identified by the absence of a dark circle (i.e.^ wells 28, 70 and 75). Selected 
drugs/compounds represent invaluable LEAD compounds that can be used in biological assays to 
confirm their mechanism of action. Validated drugs can proceed toward in vivo studies.-? 

Figure 5 shows a P-glycoprotein predicted secondary structure and amino acid 
of the linker domain. A schematic representation of P-gp predicted secondary structure. The 
twelve filled squares represent the twelve putative transmembrane domains. The two ATP 
binding domains are represented by two circles in the N- and C-terminal halves of P-gp. The 
inset represents the linker domain. The amino acid sequence of the linker domains of Human P- 
gp 1 (HP-gpl) and HP-gp3 is indicated as a single-letter amino acid code. The numbers in 
brackets at the beginning and end of each amino acid sequence of HP-gpl and HP-gp3 shows the 
length of the linker domains (1-90 and 1-88 for HP-gpl and HP-gp3, respectively). The 
numbered lines underneath the amino acid sequence show the sequences of the overlapping 
hexapeptides, which differ by one amino acid. For HP-gp3, the last hexapeptide is number 88. 
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Figure 6 shows the protein binding to overlapping hexapeptides encoding 
PJHE-gpl linker domain. Overlapping hexapeptides that encode the linker domain of HP-gpl 
were synthesized on polypropylene rods and used to identify proteins that bind to these peptides. 
A total of 90 plus two control hexapeptides for PHE-gpl were incubated with total cell lysate 
from [ 35 S] methionine metabolically labeled cells (see methods). All bound proteins were eluted 
from the peptide-fixed rods and resolved on 10% SDS J 5 AGE. Lanes 1 to 92 show the [ 35 S] 
methionine bound proteins from PHE-gpl. The migration of the molecular weight markers is 
shown to the left of gels. 

Figure 7 shows the effects of different detergents or high salt on the binding of proteins to 
^HE-gpl hexapeptides. Metabolically radiolabeled proteins bound to hexapeptides 
(hexapeptides 50 to 53) from PHE-gpl linker domain were eluted in the presence of increasing 
concentrations of anionic detergent (0.12% - 0.5% SDS), ^zwitterionic detergent (20 mM - 80 
mM CHAPS) or Sgalt (0.3 M - 1.2 M KC1). The y-axis represents the amount of radioactivity 
eluted from a pool of three hexapeptides (50 to 53). 

Figure 8 shows the effects of CHAPS on the binding of proteins to the overlapping 
hexapeptides encoding PHE-gpl linker domain. Overlapping hexapeptides of the linker domain 
of HP-gpl were incubated with total cell lysate from [ 35 S] methionine metabolically labeled cells 
extracted with 10 mM CHAPS. Bound proteins were eluted from the peptide-fixed rods and 
resolved Qffex 10% SDS-PAGE. Lanes 1 to 92 show the [ 35 S] methionine bound proteins to 
PHE-gpl linker domain. The migration of the molecular weight markers is shown to the left of 
gels. 

Figure 9 shows the protein binding to overlapping hexapeptides encoding PHE-gp3 linker 
domain. Overlapping hexapeptides that encode the linker domain of HP-gp3 were synthesized on 
polypropylene rods and used to identify proteins that bind to these peptides. A total of 88 plus 
two control hexapeptides for PH£-gp3 were incubated with total cell lysate from [ 35 S] 
methionine metabolically labeled cells. All bound proteins were eluted from the peptide-fixed 
rods and resolved on 10% SDS ^PAGE. Lanes 1 to 90 show the [ 35 S] methionine bound proteins 
from PHE-gp3. The migration of the molecular weight markers is shown to the left of gels. 
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Figure 10 shows the sequence alignment of three binding regions of PHE-gpl and 
££LE-gp3 linker domains. Alignment of PHE-gpl and Pfl£-gp3 linker domains is shown using 
a single-letter code for amino acids. The regions of high binding affinities for PHE-gp3 and 
PHE-gpl are shown in bold. Identical amino acids are shown by single letter code between the 
two aligned sequences. Conserved amino acids are indicated by plus (+) sign. The numbers on 
each side of the amino acid sequence of the linker domains refer to the amino acid sequence of 
human P-gpl and 3 as in (732Q, §9H1). 

Figure 1 1 shows the two high affinity binding hexapeptides. Two high affinity 
binding sequences 658 RSSLIR 663 (SFX) TP NO: 7^ and 669 SVRGSQ 674 (SEP TP NO: ffl f rom 
PHE-gpl linker domain were resynthesized and incubated with total cell lysate from [ 35 S] 
methionine metabolically labeled cells following 24 hour or 48 hour incubation times. Bound 
proteins were eluted from peptide-fixed rods and resolved bp^x 10% SDS ^PAGE. The 
migration of the molecular weight markers is shown to the left of the figure. 

Figure 12 shows the effects of different carrier proteins as blocking agent of 
unspecific binding. Total cell lysates from [ 35 S] methionine metabolically labeled CEM cells 
were used as is or made 1% gelatin, 0.3% BSA or 3% BSA. The cell lysates were incubated with 
a high affinity binding hexapeptide 658 RSSLIR 663 fSRO TP NO: 7^ from PHEgpl linker 
domain. The bound proteins were eluted with SDS sample buffer and resolved on 10% SDS - 
PAGE. The migration of the molecular weight markers is shown to the left of the figure. 

Figure 13 shows the purification of a 57 kDa protein. Total cell lysate was 
incubated with fifty PHE-gpl hexapeptides 658 RSSLIR 663 (SKQ TP NO: 7^ and 669 SVRGSQ 674 
fSEQ TP NO: 81 Samples containing the 57 kDa protein (P57) from one hundred hexapeptide 
incubation mix were pooled and resolved by 10% SDS JPAGE. The resolved proteins were 
transferred to PVDF membrane and stained with Ponegeau S. The migration of the molecular 
weight markers is shown to the right of the figure. 

Figure 14 shows foe-a wWestern blot analysis with anti-tubulin monoclonal 
antibodies. Total cell lysate from CEM cells and proteins eluted from the high affinity binding 
hexapeptides of PHE-gpl linker domain (P57) were resolved on SDS ^PAGE and transferred to 
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nitrocellulose membrane. One half of the membrane was probed with anti-a and anti-p tubulin 
monoclonal antibodies. The migration of the molecular weight markers is shown to the left of the 
figure. 

Figure 15 shows the helical wheel presentations of the high affinity binding region 
of PHE-gpl and PHE-gp3 linker domains. The single-letter amino acid code for the high 
affinity binding region of HP-gpl and HP-gp3 linker domains are shown. The positively charged 
amino acids on one side of the helix have been circled. 

Other objects, advantages and features of the present invention will become more 
apparent upon reading of the following non-^restrictive description of preferred embodiments 
with reference to the accompanying drawings which is exemplary and should not be interpreted 
as limiting the scope of the present invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

The function-ef or functions of proteins is mediated through an interaction thereof 
with other cellular or extracellular proteins. Until now it was thought that interactions between 
two proteins involve large segments of polypeptides that have complementary amino acid 
sequences. However, it is not known how these complementary sequences mediate the 
interactions between proteins. In this application, a novel concept to explain the principle of 
protein-protein interactions is proposed. Briefly, interactions between any two or more proteins 
are mediated by strings of discontinuous sequences with high- affinity binding and high-repulsive 
forces ££see Figure l£t The sum of these forces over the entire exposed sequence of proteins 
determines the nature and extent of the interactions between proteins. The sizes of these 
interacting domains can vary from 5 to 25 amino acids in length. The attractive forces between 
two small high- affinity binding sequences are generally larger than the sum of all the high- 
affinity binding and repulsive-forces between two proteins. Therefore, using the present 
approach, it is possible to isolate interacting proteins from a mixture of proteins using a short 
peptide (almost six amino acids) that encodes only the high- affinity binding sequence. Indeed, 
with this in mind, it is now easy to see why many methods attempting to isolate interacting 
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proteins have failed. The use of large fragments or proteins to isolate interacting proteins is less 
efficient since the sum of attractive/repulsive forces are much weaker than any string of attractive 
forces. The herein proposed principle is also consistent with the fact that protein-protein 
interactions can be modulated by post-translation modifications (e.g., by phosphorylation (429^ 
and the presence of other interacting proteins (44^Q). Hence, the addition or loss of weak forces 
following post-translation modification can disrupt the tenuous balance between high- affinity 
binding and high-repulsive forces that hold proteins together or prevent their association. Support 
for the magnitude of attractive forces between two high- affinity binding sequences is 
demonstrated in antibody-antigen binding whereby the antigen can be only of a few amino acids 
Q6, 27). Furthermore, numerous examples exist in biology where cellular interactions between 
proteins occur due to the presence of small consensus sequence of five to ten amino acids. Non- 
limiting examples of such small consensus sequences include the leucine zipper (4£3), and SH2 
and SH3 binding sequences (4£3, 4^§fl). In addition to the domains of interactions between two 
or more proteins (indicated above), protein-protein interactions can have many measurable 
effects, such as: i) changes in the kinetic properties of one or both proteins (24^2, 23§4); ii) 
formation of new binding or functional sites (44^5, 2§1Q4); and iii) the inactivation of 
function(s) fg7106. 30114). In other words, a given protein could expose different functional 
domains or sequences in the presence £as opposed to the absence} of any interacting proteins. 
Thus, in the presence of protein B, protein A can expose other sequences not previously exposed 
for interactions with other proteins (44£5, 2^84, 25104, 27 3 0 106. 114V The latter 

concept is very important as it argues against the effectiveness of some structural studies (i.e., X- 
ray and NMR) in predicting functional or surface exposed domains from the resolved crystal 
structure of proteins. By enabling the measurement and the identification of potentially all the 
high- affinity binding sites of a given protein, the present invention seeks to overcome the 
drawbacks of the results obtained from such structural studies. 

Further to the above examples of protein-protein interactions, a subset of protein- 
protein interactions is dimerization. There is an abundance of examples in biology whereby 
protein-protein interactions are essential for activation or inhibition of function (4G52). Non- 
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limiting examples of homo- or heterodimers include; growth factor receptors (©52); membrane 
transport proteins ( 29- 36 . 7r4€£); tumor suppressor proteins (4€22); and proteins that mediate 
apoptosis (33S2). In fact, dynamic dimerization is a common theme in the regulation of signal 
transduction. Some of the functional consequences of dimerization include, increased proximity 
for activation of single transmembrane cell surface receptors {e.g., EGF receptor (952)) and 
differential regulation by heterodimerization [e.g., C 1=BCL2 family of proteins (23£I)]. 

The protein concentration in living cells is very high and is in the range of 10-30 
mg/ml. At this high protein concentration, most if not all proteins should interact precisely and 
specifically with other cellular proteins. Some of the interacting proteins act as inhibitors of 
function, while others may be activators {Le.g. h The BCL2-BAX family of proteins, (23j8DH. 
Moreover, the cycling of a given protein between activator and inhibitor association will require 
the association-dissociation process to occur rapidly. For example, when protein X is associated 
with an inhibitor protein I, the domains (small sequences) that are required for the association of 
protein X with an activator protein A may not be easily accessible in the X-I complex. 
Therefore, current methods to identify associated protein (i.e., the two- hybrid system and similar 
approaches) may not be able to identify all associated proteins. In other words, current methods, 
when successful, may only identify some but not all functional domains and their associated 
proteins. By contrast, using the peptide scanning approach, the method of the present invention 
is capable of identifying all functional domains or high- affinity interacting domains of protein X 
and its associated proteins. Once the associated proteins are identified, their biological functions 
as it relates to the target protein X can be tested. Thus, for a given interacting protein, should its 
interaction with one or many possible associated proteins prove to be important for function, the 
high- affinity binding sequences (between protein X and Protein I or A) can be easily identified 
and can be used as a target site in a high throughput drug screening assays (see below) or other 
assays. 

This invention includes the concept {(described in Figures XA-XDJ1 that protein- 
protein interactions are made-up of discontinuous high- affinity binding and high-repulsive forces 
scattered throughout the 3^0 sequence of proteins and that these sequences can be isolated using 
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one of many possible approaches indicated herein (e.g. u the overlapping peptide approach). 
Although, in this application, the overlapping peptide approach is exemplified, other approaches 
can be envisioned that give similar results. It should be stressed that the approach described 
herein is immune to conformational changes resulting from interacting proteins that could affect 
other commonly used methods to identify protein-protein interactions (e.g., two-hybrid system, 
affinity blotting, and crosslinking). In the two hybrid system, for example, Protein A is fused 
with another protein sequence (the DNA-bound "bait" protein) and the other interacting protein is 
fused to the activation-domain containing "prey" protein. The fusion of interacting proteins to 
protein A could expose regions other than those found in the native conformation which will 
affect their interactions. Furthermore, the two-hybrid system has several disadvantages, some of 
which are listed below, 

i. The interaction of proteins is monitored in the nuclear milieu rather than the 
cytoplasm where most proteins are found. 

ii. Proteins can be toxic when expressed in different cells or organisms. 

iii. The interactions between two proteins in a complex in the two-hybrid system can 
sterically exclude the binding of other interacting proteins. 

iv. The post-translational modification of one protein can exclude its interaction with 
other proteins. 

v. The two-hybrid system does not allow the simultaneous identification of the 
precise amino acid sequences between two interacting proteins. 

vi. The application of the two-hybrid system is associated with high percentage of false 
positives. 

vii. The two-hybrid system cannot be easily applied to different cell types or tissues whereby 
different interacting proteins may be expressed (this can be a critical drawback of this system). 

Method to Identify Interacting Proteins and Sites of Interactions for Protein A 

The present approach and methodology used to identify discontinuous strings of 
sequences between two or more interactive proteins is a scanning overlapping peptide approach. 
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Using this approach, a large number of short overlapping peptides which cover the entire amino 
acid sequence of the a-given protein* "the bait*" are synthesized in parallel on an inert solid 
support t£see Figure 2£. The rationale for synthesizing a large number of overlapping peptides as 
opposed to a discontinuous peptide library is based on the fact that one does not know a priori 
what exact sequence of a given protein will contain the high affinity binding sites and the 
repulsive sequences. Therefore, a discontinuous peptide approach will often lead to the presence 
of both high affinity binding sequences and repulsive sequences in the same peptide. Such 
peptides will not bind to potential interacting proteins with high affinity. Moreover, the use of 
overlapping peptides also provides internal controls for unspecific binding. For example, using 
overlapping peptides, the high affinity binding sequences will give a peak of signal when 
peptides within the high affinity domain will have the high affinity amino acid sequences but will 
lack amino acids which provide the repulsive forces ftSgee f£igure 6 in Example I}). Of course, 
it should be understood that the present invention is not dependent on a spanning of the full 
peptide sequence. Indeed, sub-region(s) of a protein can be used. In addition, overlapping 
peptides can be derived from a chosen domain of a protein. Also, it would be envisageable to 
probe an overlapping peptide side set of a first protein wth an overlapping peptide set of a second 
protein. 

To demonstrate how one can use this approach of overlapping peptides as "a bait" 
to isolate interacting proteins "the prey" or "preys" from a mixture of total cell proteins, the 
following example can be considered. P-glycoprotein is a membrane protein that confers 
resistance to anticancer drugs and therefore is responsible for the failure of chemotherapy. 
Although, P-glycoprotein has been shown to function by preventing the accumulation of 
chemotherapeutic drugs in tumor cells; the exact mechanism of how this protein functions and 
what are the associated proteins that modulate its function are not known. Thus, it is of interest 
to identify proteins that interact with P-glycoprotein, such as to enable an inhibition of binding 
between P-glycoprotein and its associated proteins, thereby potentially modulating its function in 
resistant tumor cells. In this example, it was of interest to identify those proteins which bind to 
the linker domain of P-glycoprotein. Thus, in this particular example, a domain of a chosen 
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protein was used. The linker domain, encodes a region of about 90 amino acids. Thus, 
overlapping hexapeptides covering this entire linker sequence of P-glycoprotein were synthesized 
onto a solid support using standard F-moc chemistry (+74). The covalently fixed peptides (on a 
solid support) were incubated with a total cell lysate isolated from cells metabolically with 
[ 35 S]methionine. The peptides and total cell lysate were incubated in the presence of a carrier 
substrate (1-3% B^ovine S§erum Aalbumin, or 1-3% gelatin, 1-3% Sgkim milk, etCT.) for 18 
hours at 4°C. Following this incubation period, the covalently fixed peptides were washed 
extensively with isotonic buffer. Any proteins from the radiolabeled total cell lysate which 
maintained their association with the overlapping hexapeptides following the washing step are 
eluted in SDS-contain sample buffer and analyzed or^X SDS polyacrylamide gel electrophoresis 
(SDS-PAGE) (4£2). The presence of radiolabeled proteins on SDS PAG gpolvacrvlamide gels 
following gel drying and signal enhancement, provides the following information: 

1) those specific overlapping peptides represent high affinity binding sequences in the P- 
glycoprotein linker domain (or other chosen domains or non-chosen domains); and 
2) the boyfld-proteins hound to the specific overlapping peptides are associated proteins (see 
Figure 6). 

The associated proteins which bound to the high affinity binding sequences, can be isolated in 
large quantities for the purpose of determining their identity by N-terminal amino acid 
sequencing by Edman degradation (27) or the like. Briefly, the sequences of the overlapping 
peptides that bound a given protein are resynthesized on a solid support and kept fixed thereto. 
Total cell lysate from [ 35 S]methionine metabolically ^radiolabeled cells is added to the solid 
support containing the fixed high affinity sequence peptides and incubated as described above. 
Following washing steps to remove unbound material, the associated protein is isolated in large 
amounts following an elution step with SDS-containing buffers (see below). The purified 
associated protein is now ready for amino acid sequencing. Of course, should further purification 
steps be required, they are well known to the skilled artisan. The purified protein is rajin on SDS- 
PAGE polvacrvlamide ffels and the resolved protein is transferred to PVDF membrane as 
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previously described (1M)- Other methods for amino acid sequence determination can also be 
easily applied (2 67. 33V 



Method to Identify the Amino Acid Sequences Between Two Interacting Proteins 

The same concept as described above can be applied if one is only interested in 
identifying the high affinity binding sequences between two proteins. A non-limiting example of 
such two proteins are the regions of interactions between p53 and MDM (32§, 8 4103V 
Specifically therefore, the purpose of this exercise is to identify the high affinity binding 
sequences between proteins A (p53) and protein B (MDM) in order to use these sequences as 
target sites for the identification of compounds that modulate this interaction and more 
particularly for the development of drugs. Thus, in one embodiment, when a given drug is bound 
to one of these high affinity binding sites on protein A, it will prevent the formation of the active 
complex (protein A+B) and therefore inhibit the functions of the complex. To isolate the string 
of high- affinity binding sequences between Protein A and B {£see Figure 3]i, small overlapping 
peptides (5 to 7 amino acids) that cover the entire amino acid sequence of protein A* "the bait" 
will be synthesized in parallel onto a solid support (as mentioned above and described in more 
detail in Example 3). Note that, in this particular embodiment, only the primary amino acid 
sequence of protein A* "the bait*" is needed. Once the peptides are synthesized (peptide synthesis 
is done parallel on a solid support in 96-well plates), an enriched and radiolabeled full-length 
protein B "tho proy" (the radiolabeled protein B is easily obtained from in vitro transcription- 
translation reactions4-434 4V HI 8^ "the prey." is added to each well of the 96-well plate that 
contain the covalently fixed overlapping peptides. The peptides encoding protein A are 
incubated with radiolabeled protein B to allow for binding to occur. Following an incubation 
period (5 to 24 hours), unbound radiolabeled protein B will be removed by extensive washing in 
isotonic buffer. Any overlapping peptides which bound to radiolabeled protein B will be eluted in 
the presence of denaturing agents. The eluant from each of the 96-well plates are analyzed for 
the presence of radiolabeled protein B by running the samples or^X SDS J* AGE (4£2). High- 
affinity binding peptides will be identified as those that retain the radiolabeled Protein B. 
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The use of metabolically radiolabeled proteins as "the prey" to interact with the 
overlapping peptides e nood i nga f "the bait'V increases the sensitivity of this technique and allows 
the identification of interacting proteins with binding affinities of 10" 10 — ^lO 12 M for a standard 
50 kDa protein which encodes one to ten radiolabeled methionine residues (§20). 

Method to Use High Affinity Binding Sequences in High Throughput Assays to Screen for 
Lead Compounds 

The approach, described herein, to identify high- affinity binding sequences or target sites 
for drug development can also be used in high throughput assays to screen for small molecules 
from combinatorial libraries. For example, to select drugs that specifically inhibit the binding of 
protein A to B {(see Figure 4], Ono] LQB£ or more target sites (the high affinity binding 
sequences) are synthesized in each of the 96-well plates as described earlier. In this example 
{(Figure 4Ji the same high affinity binding sequence is synthesized in all of the wells. To each 
well containing the high affinity binding sequence, one or more small molecules from 
combinatorial library are added. Following the addition of drug(s), a radiolabeled protein B from 
an in vitro transcription-translation mix, for example, is added and allowed to incubate as 
indicated above. Following several washes, bound protein B is eluted with SDS-sample buffer. 
Wells containing radiolabeled protein B indicates that the drug had no effect on the binding 
between the high affinity binding sequence and protein B. Alternatively, if one or more wells do 
not contain radiolabeled protein B in the presence of a drug, then that drug has inhibited the 
interactions between the high affinity binding sequence ffemflu A and protein B. Hence, the 
latter drug is a good LEAD compound. These drugs can now enter the second phase of their 
analysis to determine if they prevent the formation of the active complex of full length protein A 
and B. Active drugs that are identified will be tested in vivo to further confirm their mechanism 
of action. In this manner, more specific drugs with fewer or no side-effects will be developed. 

The latter point provides an advantage since most proteins have more than one 
biological function. For example, if protein A interacts with itself, it will do one function, while 
the same protein interacting with a different protein will do a different function. Moreover, 
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protein A* when part of a given complex of associated proteins* will mediate several functions, 
inhibiting the interactions between protein A and B 4 while leaving the interactions between 
proteins A and C, D, or F intact will inhibit one or few cellular pathways. By contrast, inhibiting 
the function of protein A will inhibit the functions of the entire complex. In this respect, the 
identification, isolation and development of drugs that will inhibit specifically interactions 
between two proteins within a complex of proteins should result in more specific drugs with 
fewer side effects. In addition, as different proteins are differentially expressed in different 
tissues or organs, the composition of a given protein complex will be different between different 
tissues. Hence, the approach of developing drugs that inhibit protein-protein interactions will 
also lead to drugs that are organ or tissue -specific. 

Of course, it will be understood that the present invention also provides 
quantitative assays to measure the protein-protein interaction and the modulation thereof by 
compounds. 

In conclusion, the approach described in this application for the identification 
interacting proteins, the precise amino acid sequence between interacting proteins, and targeting 
of such specific sequences in proteins with drugs that inhibit protein-protein interactions has 
tremendous potential in dictating future drug discovery in the pharmaceutical industry. 

The present invention is illustrated in further detail by the following non-limiting 

examples. 

EXAMPLE 1 

P-glycoprotein bRinding-te- to tTubulin4s- is mMediated 
by sSequences in the ILinker dDomain 

The successful treatment of cancer patients with chemotherapeutic drugs is often 
limited by the development of drug-resistant tumors. Tumor cell lines selected, in vitro, with a 
single anticancer drug become resistant to a broad spectrum of chemotherapeutic drugs, termed 
multidrug resistant (or MDR) tumor cells (for review, Ul*_45, 34, 496 6)- Moreover, the 
expression of MDR in these tumor cells has been associated with the overexpression of two 
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membrane proteins; the MDR1 P-glycoprotein (P-gp) and the multidrug resistance-associated 
protein (MRP1) (21*45, 34, 40£ ti). Both P-gp and MRP are members of a large family of 
membrane transporter proteins known as ATP Blinding Gfiassette proteins or ABC membrane 
transporters di44). Although, the structure of P-gpl remains a matter speculation (€021), 
cumulative topological evidence suggest a tandemly duplicated structure of six transmembrane 
domains and a large cytoplasmic domain encoding an ATP binding sequence (45g, §4^£§). The 
two halves of P-gpl are linked by a stretch of 90 residues rich in polar or charged amino acids, 
termed the Uftkef^UnkgE domain." 

The P-gp gene family is made up of three structurally similar isoforms in rodents 
(classes I, n, and HI) and two isoforms in humans (classes I and EI) (4420). Gene transfer 
studies suggest functional differences among these structurally similar isoforms. For example, 
only the P-gp isoforms of classes I and II confer the MDR phenotype (4625, 73JJL1), while the 
class HI isoforms do not (7H, 0§2£). The class III isoforms mediate the transfer of 
phosphatidylcholine from the inner to the outer leaflet of the plasma membrane (i.e., "flipase") 
(§422, 0?HIQ). In normal tissues, P-gp distribution is restricted mainly to tissues with secretory 
functions (5879. 7JJL6). Its polarized localization to apical surfaces facing a lumen in the adrenal 
gland, liver, kidney intestine suggests a normal transport or detoxification mechanism. 
Moreover, hematopoietic stem cells and specific lymphocyte subclasses also express high levels 
of P-gp (3742). The normal function or substrate(s) of the classes I and II remain undefined; 
however, the disruption of the class I or/and II genes from the mouse genome results in the 
accumulation of cytostatic drugs or lipophilic compounds in most normal tissues, but more 
strikingly in the brain (0622, 0?1M). Based on these results it is speculated that the normal 
function of P-gp (the class I and II or the MDR causing P-gp) is detoxification similar to that 
seen in MDR cells, especially at the blood brain barrier (44^2). 

High levels of P-gp have been found in many intrinsically drug resistant tumors 
from colon, kidney, breast and adrenals as well as in other tumors which had acquired the MDR 
phenotype after chemotherapy (for example, in acute non-lymphoblastic leukemia) (40 7 -24 1 -24, 
3 2. 35 . 402, 5 43. 78V Several studies have now established an inverse correlation of P-gp 
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expression and the response to chemotherapy ( 5, §82, 7 41 13\ f£urther, Chan al (14£, 1^2) 
have shown that P-gp expression was prognostic of MDR and durable response in childhood 
leukemia, soft tissue sarcomas and neuroblastomas of children. In light of these studies there 
appears to be convincing evidence, at least in some cancers, that P-gp levels predict the response 
to chemotherapeutic treatment. 

Direct binding between P-gp and various lipophilic compounds has been 
demonstrated using photoactive drug analogues (§ZL=23, 62, 632 4)- Certain compounds which 
bind to P-gp were shown to reverse the MDR phenotype presumably by competing for the same 
drug binding site in P-gp (234, 2€2§). These compounds, which have been collectively labeled 
as MDR-reversing agents, include verapamil, quinidine, livermectin, cyclosporins, and 
dipyrimadol analogues to name but few (234, 2638V Clinical trials using ffifl frMDR -reversing 
agents (e.g., verapamil or quinidine) have shown some response in tumors that were otherwise 
non-responsive to chemotherapy (4722, 3344, 7H7). However, high pharmacological toxicity 
associated with several fflefrMDR-reversing agents has prevented their use at noneffective 
concentration (§061). A better clinical response has been observed using other ffle^MDR- 
reversing agents (i.e., cyclosporin A and its non- immunosuppressive analog PSC833); however 
toxic effects have also been seen with cyclosporins (GSJiH, 7H5) a 

P-gp was shown to be a substrate for protein kinases C and A (2, 9). Moreover, it 
has been demonstrated that agents T which modulate protein kinase C activity, modulate P-gp 
phosphorylation and its MDR-mediate phenotype (42, 313). In one study (2GMX PMA phorbol 
ester (a protein kinase C activator) was shown to increase the MDR phenotype and drug efflux in 
MCF7 breast cancer cells. In another study (9£), sodium butyrate treatment of SW620 human 
colonic carcinoma cells was shown to result in a large increase in P-gp expression without a 
concomitant increase in drug-resistance or -efflux. Interestingly, P-gp in SW620 cells was also 
shown to be poorly phosphorylated following sodium butyrate treatment (3£). Taken together, 
the lack of transport function of P-gp in SW620 cells was not clear, however mutations of P-gp 
phosphorylation sites within the linker domain was shown not to affect its drug transport function 
(284Q). By contrast, protein kinase C modulation of serine/threonine residues in the linker 
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domain regulated the activity of an endogenous chloride channel and thus suggests that P-gp is a 
channel regulator (3Q41, 3 3110V Thus, although, it remains unclear what functions the linker 
domain of P-gpl mediates, it was of interest to identify the proteins that interact with linker 
domain using an in vitro assay. The latter assay is based on the novel understanding of protein 
interactions provided by the present invention. The results show hereinbelow that three 
sequences in the linker domain bind to proteins with apparent molecular masses of -80 kDa, 57 
kDa and 30 kDa. Purification and partial N-terminal amino acid sequencing of the 57 kDa 
protein showed that it encodes the N-terminal amino acids of a and p-tubulins. 

Thus, using a protein domain as an example of a validation of the power of the 
present invention, it was demonstrated that: i) this domain is bound specifically to proteins; ii) 
the specifically binding proteins can be formefally identified; and iii) the sequence responsible 
for the specific binding of these proteins formefally identified (together with the interacting 
domain of this binding protein, if derived). 

EXAMPLE 2 
Materials 

[ 35 S] methionine (1000 Ci/mmol; Amersham Life Sciences, Inc.) and [ 125 I] goat anti- 
mouse antibody were purchased from Amersham Biochemical Inc. Protein-A Sepharose-4B was 
purchased from Bio-Rad Life Science. All other chemicals used were of the highest commercial 
grade available. 

EXAMPLE 3 
Peptide Synthesis 

Pr e d e r i vatiz e d Pre-derivatized plastic rods, active ester and polypropylene trays were 
purchased from Cambridge Research Biochemicals (Valley Stream, NY). Peptides were 
synthesized on solid polypropylene rods as previously described (2?M, 3822). Briefly, the F- 
moc protecting group on the prederivatized polypropylene rods as solid support (arranged in a 
96-well formate) was removed by incubation with 20% (v/v) piperidine in dimethylformamide 
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(DMF) for 30 minutes with shacking. Following the deprotection of the p-alanine spacer on the 
polypropylene rods, Fmoc protected amino acids were dissolved in HOBt/DMF and added to the 
appropriate wells containing deprotected rods. Coupling of amino acids was allowed to take 
place for 18 hours at room temperature after which the rods were washed in DMF (1X2 
minutes), methanol (4X2 minutes), and DMF (1X2 minutes). The coupling of the second 
amino acid required the deprotection of the F-moc amino protecting group of the first amino acid 
and incubation of the rods with the second preactivated F-moc -protected amino acids 
(pentafluorophenyl derivatives). The reaction was allowed to proceed for 18 hours* and the rods 
were removed and washed as indicated above. The same steps were repeated for each amino 
acid coupling until the sixth amino acid was coupled. Following the last coupling step, the F- 
moc N-terminal protecting group was removed with 20% piperidine/DMF and the free amino 
group acetylated for 90 minutes in an acetylation cocktail containing acetic anhydride: 
diisopropylethylamine (DIEA): DMF (50:1:50 v/v/v). The side chain protecting groups of the N- 
terminal acetylated hexapeptides onto the polypropylene rods were removed by incubation in a 
cleavage mixture containing trifluoroacetic acid: phenol: ethandithiol (95:2.5:2.5 v/v/v) for 4 
hours at room temperature. After the cleavage step the rods were washed with dichloromethane 
(DCM) and neutralized in 5% (v/v) DIEA/DCM. The deprotected peptide-coupled rods were 
washed in DCM, methanol and vacuum dried for 18 hours. 

EXAMPLE 4 
Tissue Culture and Metabolic Labeling of Cells 

Drug sensitive (CEM) and resistant (CEM/VLB 1 °) cells were cultured in a-MEM media 
supplemented with 10% fetal calf serum (Hyclon£^ Inc.) as previously described (§g). All cells 
were examined for Mycoplasma contamination every three months using the Mycoplasma PCR 
kit from Stratagene Inc. £San Diego, CA}. For metabolic labeling of cells, CEM or CEM/VLB 1 0 
cells at 70-80% confluency were metabolically labeled with [ 35 S] methionine (100 |iCi/ml) for 6 
hours at 37^C in methionine-free a-MEM media. 
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EXAMPLE 5 
Cell eExtraction and Binding Assay 
Following metabolic labeling of proteins with [ 35 S] methionine, cells were washed 3 
times with phosphate buffered saline (PBS) and resuspended in hypotonic buffer (10 mM KC1, 
1.5 mM MgCh, 10 mM Tris-HCl, pH 7.4) containing protease inhibitors (2 mM PMSF, 3 ^ig/ml 
Leupeptin, 4 jxg/ml pepstatin A and 1 jig/ml aprotinin) and kept on ice for 30 minutes. Cells 
were lysed by homogenization in a hypotonic buffer and the cell lysate was sequentially 
centrifuged at 6000 xggLg for 10 minutes. Following the latter centrifugation, the supernatant 
was removed and made 0.5 M NaCl final concentration from a stock solution of 4 M NaCl. The 
cell lysate was incubated on ice for 30 minutes. The sample was mixed and brought back to 0.1 
M NaCl final concentration. The cell lysate was centrifuged for 10 minutes at 15,000 X§JLg at 
4 G G^£. The latter supernatant was removed and recentrifuged at 100,000 X&Xjj for 60 minutes 
in a Beckman ultracentrifuge using SW55 rotor. The amount of protein in the above samples was 
determined by the method of Lowry (§3fi2). 

For a binding assay, [ 35 S] methionine labeled proteins from total cell lysate 
were mixed with equal volume of 3-6% BSA in phosphate buffered saline (PBS) and incubated 
with overlapping hexapeptides covalently fixed to polypropylene rods. The peptides and total 
cell lysate were incubated overnight at 4 e G^£. The rods were then removed and washed four 
times in PBS. The bound proteins were eluted by incubating the peptide-fixed rods in IX SDS 
sample buffer for 60 minutes at room temperature with shaeking. The peptides-fixed rods, were 
regenerated by incubation in PBS, containing 2% SDS and 1 mM p-mercaptoethanol at 65°C in a 
sonicator for 30 minutes. Following the latter incubation, the rods were washed for five minutes 
in 65°C ionized water and two minutes in 65°C methanol. The peptides-fixed rods are now ready 
for the next round of screening. In cases where the effects of various detergents on binding was 
tested, [ 35 S] methionine labeled proteins from total cell lysate were mixed with equal volume of 
3%BSA in phosphate buffered saline containing KC1 (300 mM to 1200 mM), SDS (0.12% to 
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2%), or CHAPS (20 mM to 160 mM) and incubated with covalently fixed peptides as described 
above. 

EXAMPLE 6 

Polyacrylamide Gel Electrophoresis and Western Blotting 

Protein fractions (100-150 (xl) were resolved on SDS-PAGE using the Laemmli gel 
system (47£2). Briefly, proteins were dissolved in IX solubilization sample buffer I (62.5 mM 
Tris-HCl, pH 6.8, containing 2% (w/v) SDS, 10% (w/v) glycerol and 5% P-mercaptoethanol) and 
samples were electrophoresed at constant current. Gel slabs containing the resolved proteins were 
fixed in 50% methanol and 10% acetic acid. Polyacrylamide gels containing [ 35 S] methionine 
proteins were exposed to Kodak x-ray film following a thirty-minute incubation in an Amplify™ 
Amplifv™ so1ution (Amersham Inc.). 

Alternatively, proteins were transferred to nitrocellulose membrane in Tris- 
glycine buffer in the presence of 20% methanol for Western blot analysis according to the 
procedure of Towbin et al (71fl£). Nitrocellulose membrane was incubated in 5% skim 
milk/PBS prior to the addition of anti-a or anti-p tubulin monoclonal antibodies (0.5 |ug/ml in 3% 
BSA; Amersham, Inc.). Following several washes with PBS, the nitrocellulose membrane was 
incubated with goat anti-mouse peroxidase ^conjugated antibody and immunoreactive proteins 
were visualized by chemiluminescence using ECL method (Amersham* Inc.). 

EXAMPLE 7 
Protein Purification and N-terminal Sequencing 

The 57 kDa associated protein was purified using a block of polypropylene rods with two 
high affinity binding peptides. Briefly, the peptide-fixed rods were incubated with total cell 
lysate as indicated above T i however, in this case the carrier substance was gelatin (1%). The 
bound proteins were eluded in 100 mM phosphate buffer, pH 7.4 containing 2% SDS and 0.1% 
p-mercaptoethanol. The eluted proteins were precipitated by mixing with 9 volumes of ice cold 
ethanol and incubated at -20°C. Following a high speed centrifugation of the latter sample (15 
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minute centrifugation at 15,000 XoX g. at 4°C), the precipitated proteins were resuspended in 1% 
SDS in PBS and mixed with equal volume of 2X SDS Laemmli sample buffer (4762). Protein 
samples were resolved mhx 10% SDS ^PAGE and transferred to PVDF membrane. The 
migration of the 57 kDa band was visualized by staining the PVDF membrane with p£on6£eau S. 
The PVDF membrane containing the 57 kDa band was excised and submitted to the protein 
sequencing facility at the Biotechnology Service Centre in Toronto, Ontario. Amino acid 
sequencing of peptides was performed according to the method of Edman and Begg (+922) using 
an applied biosystems Ggas-^flhase Model 470A coquonatorTM sequenator™ according to the 
procedure described by Flynn (2^22). 

EXAMPLE 8 
Identification of P-gp ^Interacting pProteins 

As explained above, P-gp is a tandemly duplicated molecule made up of two 
halves with each encoding for six transmembrane domains and an ATP binding domain. The 
two halves of P-gp are linked by a linker domain. Of the 90 amino acids that make up the linker 
domain, 32 amino acid are either positively or negatively charged at physiological pH. While P- 
gp phosphorylation sites appear to have relevance to P-gp function, the function of the linker 
domain of P-gp remains unknown. To identify and dissect the role of this domain in MDR, the 
overlapping peptides method of the present invention was used. A novel approach was 
developed to isolate interacting proteins using overlapping synthetic hexapeptides. The use of 
overlapping peptides to isolate interacting proteins allows the specific identification of 
interacting proteins and bypasses many of the problems associated with the use of random 
peptides. Figure 5 shows the amino acid sequences of the linker domain of PHE-gp 1 and PJIE- 
gp 3. The two linker domains of PHE-gpl and PHE-gp3 share 41% amino acid sequence 
identity ofand 66% sequence homology. Overlapping hexapeptides were synthesized in parallel 
on derivatized polypropylene rods as previously described (2§2fi, 327). 92 and 90 hexapeptides 
were synthesized to cover the entire linker sequence of PHE-gpl and PH£-gp3, respectively. 
The hexapeptides remain covalently attached to the polypropylene rods. 
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To identify the interacting proteins with the various hexapeptides of the linker 
domains, the peptide-fixed rods were incubated with total cell lysate from [ 35 S] methionine 
metabolically labeled CEM or CEM/VLB 1 0 cells. After washing off non-specifically binding 
lysate proteins, the specifically bound proteins were eluded with SDS containing buffers and 
resolved enfex SDS ^PAGE. Figure 6 shows the proteins specifically bound to the 92 
overlapping h e xap e ptid g s hexa-peptides from PHE-gpl linker sequence. Three regions in 
gpl linker domain ( 617 FKGTYFKI A/TM 627 fSKO TP NOt IV 657 SRSSLIRKRSTRRSVRGSQA 676 
rSEO ID NO: 2^ and 693 PVSFWRIMKLNLT 7()5 SEP TP NO: A bound a 57 kDa protein. The 
hexapeptides numbers 46-60, 81-89 and 5-9 (see f£igure 5) bound with decreasing affinities to 
the 57 kDa protein (Figure 6). Moreover, peptides 46-60 showed binding to two other proteins 
with apparent molecular masses of 80 kDa and 30 kDa, however much weaker than that of the 
57 kD a protein . It is likely that the latter proteins (80 kDa and 30 kDa) are associated with the 
57 kDa, since these proteins are detected when the intensity of the 57 kDa protein signal is high 
(fEgure 6, peptides 50-56). Comparison of the amino acid sequences of the three 57 kDa 
binding proteins did not reveal significant sequence homology among them to account for their 
binding to the same protein. Interestingly, however, the amino acid sequence of the second 
region (peptides 46-60) encodes for protein kinase C consensus sequences (1Q£). In addition, the 
third region (peptides 81-89) was also shown to encode for a protein kinase A site 0132). 

To determine the affinity of binding between the sequences of the 
hexapeptides and the 57 kDa protein, it was of interest to determine the effects of high salt (0.3- 
2.4 M KC1), ^zwitterionic detergent (10-160 mM CHAPS) and ionic detergents (0.1%-2%SDS) 
on the interactions between the hexapeptides encoded by 657 SRSSLIRKRSTRRSVRGSQA 676 „ 
fSEQ TP NO: 2) and the 57 kDa protein. Our results show the binding to be stable to high salt, 
moderately stable to high concentrations of CHAPS, but sensitive to low concentrations of SDS 
(flEigure 7). Given the stability of protein binding to covalently attached peptides, in the presence 
of 10 mM CHAPS, it was of interest to determine the binding of the hexapeptides from PHE-gpl 
linker domain to CHAPS soluble proteins that could include integral membrane proteins. The 
results in f£igure 8 show bound proteins to the same overlapping hexapeptides that codes for the 
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linker domain of PfiE-gp 1. Although the hexapeptides numbers 46-60, 81-89 and 5-9 (see 
f£igure 5) bound to the 57 kDa protein (Figure 7); other proteins were found to interact with the 
same or different hexapeptides which did not bind proteins in the absence of 10 mM CHAPS. 
For example, hexapeptides 3-10 bound to ~ 210 kDa protein that was not detected previously in 
the absence of CHAPS. Similarly, hexapeptides 16-20, which did not bind any proteins in the 
absence of CHAPS, bound to the same high molecular weight protein (Figure 7). Peptides 40-60 
bound more strongly to several low molecule weight proteins (-45-25 kDa) in the presence of 
CHAPS. The hexapeptides 80-89 bound to two other proteins in addition to the 57 kDa protein. 
Taken together, the results in f£igure 8 demonstrate that the binding between the various 
hexapeptides to the 57 kDa protein is resistant to mild zwitterionic detergents such as CHAPS. 
Moreover, the solubilization of membrane proteins in 10 mM CHAPS show binding to other 
proteins not seen in the absence or 10 mM CHAPS. One possibility is that 10 mM CHAPS 
allows integral membrane proteins to interact with the various hexapeptides of PJHE-gp 1 linker 
domain. Alternatively, CHAPS exposes new domains that in turn allows for binding to 
hexapeptides of PHE-gpl linker domain. In addition, some of the lower molecular weight 
proteins that bound to hexapeptides 40-60 and 80-89 may be degradation products of the 57 kDa 
protein (fEigure 8). 

The P-gp gene family in man is encoded by two isoforms, PHE-gp 1 
and PHE-gp 3 (or mdr 1 and mdr 3; (442fi)). However, as indicated earlier, only PHE-gp 1 
confers an MDR phenotype. Moreover, although PHE-gp 1 and 3 share about 80% amino acid 
sequence homology (731H); the linker domain is the most variable domain among the two 
isoforms with 66% amino acid sequence homology. To determine if the PHE-gp 3 linker 
domain binds to the same or different proteins, overlapping hexapeptides encoding FflE-gp 3 
linker domain were synthesized on polypropylene rods and their binding to soluble proteins was 
examined as indicated above. Figure 9 shows the profile of binding proteins to the hexapeptides 
of PHE-gp 3. Interestingly, a similar molecular weight protein (57 kDa) also bound to the 
hexapeptides from PHE-gp 3. However, the binding to some hexapeptides was different from 
that seen with PHE-gp 1 (fEigure 6 versus *£igure 9). For PH£-gp 3, three larger stretches of 
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amino acids ( 6 1 8 LMKKEG V YFKLVNM 63 1 ? ISEQ ™ NP ; dY 

M8 KAATRMAPNGWKSRLFRHSTQKNLKNS 674 fSEP TP NO: S\ and 
695 PV5 ; FI KVT KT NKT 6 CT07 (SFO jf) NO! ^ bound to the 5? ^ pro tein. The first and third 

regions of PHE-gp 3 linker domain share considerable sequence identity with the first and third 
regions of PHE-gp 1 linker domain (f£igure 10). Hence, it is not surprising that the same 
hexapeptides bound to the same protein. The second region of PflE-gp 1 and PHE-gp 3 linker 
domains are different (*£igure 10). Consequently, although both the PHE-gpl and PHE-gp3 
sequences bound to a 57 kDa, the region of interaction between PHE-gp 3 and the 57 kDa 
protein is larger than that of PHE-gp 1 (fEigure 6 and *£igure 9). A comparison of the amino 
acid sequences from PflE-gp 1 and PHE-gp 3 binding hexapeptides is shown in Figure 10. 

EXAMPLE 9 
Purification and sSequencing of the 57 kDa pProtein 

To determine the identity of the 57 kDa proteins, several copies of two hexapeptides 
( 658 RSSLIR 663 (SEP TP NO: 7^ and 669 SVRGSQ 674 (SEP TP NPt to from the second region of 
PHE-gp 1 linker domain were synthesized. The latter hexapeptide sequences were those that 
bound with the highest affinity to the 57 kDa protein. Figure 1 1 shows the binding of these two 
peptides to total cell lysate from [ 35 S] methionine metabolically labeled cells. Both hexapeptides 
bound specifically to the 57 kDa protein and another protein of an apparent molecular mass of 
-41 kDa. Interestingly, longer incubation times of the total cell lysate led to an increase in the 
level of the 41 kDa protein (f£igure 11). Thus, the 41 kDa band is likely a degradation product 
of the 57 kDa protein. 

To purify the 57 kDa protein using the two hexapeptides, it was of interest to 
determine if other carrier proteins than BSA can be used. Figure 12 shows the effects of no 
blocking carrier, 1% gelatin and 0.3% or 3% BSA on the binding of the hexapeptides to the 57 
kDa protein. These results of this experiment were surprising in that no carrier protein was 
required to reduce the unspecific binding (f£igure 12). The latter established binding conditions 
were used to isolate large amounts of 57 kDa protein that bound to several copies of 
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hexapeptides 658 RSSLIR 663 fftKQ TP NO; 7) and 669 SVRGSQ 674 rSRO TP NO: 8Y Figure 13 
shows purified 57 kDa protein on SDS-PAGE stained with Coomassie blue. The latter purified 
protein was transferred to PVDF membrane and stained with Ponceau S to localize the position 
of the 57 kDa protein. The Ponceau S ^stained band that migrated with the expected molecular 
mass was cut out and used for direct N-terminal sequencing (3 233 Y The first seven rounds of 
Edman degradation showed two sequences of MREVISI and MREIVHI. These two sequences 
differed only by three amino acids (VIS instead of IVH). Comparison of the two sequence with 
known protein sequences using FastA protein search engine, showed the latter sequences to 
encode the first seven N-terminal amino acids of a- and p-tubulins. The identification of 
tubulins, as the 57 kDa protein was consistent with the apparent molecular mass and the potential 
degradation products that were observed following long incubation periods. To further confirm 
the identity of the 57 kDa protein as tubulins, Western blot analysis was preformed on 
hexapeptide-bound 57 kDa protein and total cell lysate resolved e&teg, SDS J 5 AGE and 
transferred to nitrocellulose membrane. The nitrocellulose membrane was then probed with anti 
a-tubulin and anti-p-tubulin monoclonal antibodies, respectively. Figure 14 shows the results of 
the Western blot analysis. Consistent with the sequencing results, both tubulin subunits (a and p) 
were recognized in the lanes containing the hexapeptide bound proteins. Thus, establishing the 
identity of the 57 kDa protein as a and p-tubulin. 

EXAMPLE 10 

The power of the overlapping peptide spanning method invention was thus 
validated with P-gp. As shown above, the overlapping peptide-based method of the present 
invention provides the proof of principle to the hypothesis which states that the region between 
two interacting proteins consists of high affinity binding sequences and repulsive sequences as 
well as the fact that such a method can be used efficiently and successfully to identify and 
characterize domains and sequences of interacting proteins. The balance of high affinity and 
repulsive forces determine whether two proteins will form stable complex. The use of short 
overlapping peptides allows the identification of such high affinity binding sequences between 
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bait and prey proteins. The rationale for using short aft€l-overlapping peptides to isolate high 
affinity binding sequences is essential to the success and efficiency of the proof of the principle 
described herein. For instance, larger peptides could contain both high affinity and repulsive 
binding sequences in one peptide sequence such that the net force of interaction is negative. 
Moreover, the use of overlapping peptides that differ mbv one amino acid from the previous or 
next peptide reduces the possibility of unspecific binding. Thus, overlapping peptides often 
demonstrate a peak in the binding affinity of various peptides (see fl£igures 7 and 4). The skilled 
artisan will understand that longer overlapping peptides could also be used. Unfortunately, such 
larger peptides increase the risk of missing the identification of interacting proteins due to a 
change in the balance between high- affinity and repulsive amino acids. 

The binding of 57 kDa protein to three different regions in PHE-gpl and 
PHE-gp3 linker domains is consistent with the herein proposed hypothesis to explain protein 
interactions (see principle of protein-protein interactions). The high affinity binding domains 
vary in sizes from 10 -26 amino acids in length. In the case of PJHE-gpl and PHE-gp3 linker 
domains, two of the three high affinity binding domains shared considerable sequence identity. 
The third high affinity binding region of the linker domains ( 658 SRSSLIRKRSTRRSVRGSQA 677 
(SEP TP NO: 2^ versus 648 KAATRMAPNGWKSRLFRHSTQKNLKNS 674 (SEP IP NP: 5T> 
shared no homology in their primary amino acid sequence. However, helical wheel presentation 
of these two domains show a cluster of positively charged residues on one face of the helix while 
a cluster of serine/threonine residues on the other side (see *£igure 15). Interestingly, the region 
of highest binding affinity to the 57 kDa protein encodes the three putative phosphorylation sites 
in PHE-gp 1 (105). The positions of the phosphorylation sites in PflE-gp3 have not bei£n§ 
determined experimentally, however they encode for the consensus sequence of protein kinase C. 
In this respect, it is possible that PHP-gpl and PHE-gp3 interactions at the linker domains is 
modulated by phosphorylation of this domain. Thus, although mutations of P-gp phosphorylation 
sites within the linker domain were shown not to affect its drug transport function (2Q4fl), other 
proposed functions of PHE-gpl (e.g., regulator of endogenous chloride channel) was shown to 
be affected by its phosphorylation state (3Q£L T ailOV Indeed, a member of the ABC 
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transporters, CFTR (the cystic fibrousis transmembrane conductance r egulator), which encodes 
a similar linker domain was found to co-localize with the microtubule network (7102). 
Furthermore, microtubule-dependent acute recruitment of CFTR to the apical plasma membrane 
of T84 cells was responsive to elevations in intracellular cAMP and phosphorylation of the linker 
domain (7102). Taken together, although it is not clear if phosphorylation plays a role in 
modulating P-gp functions in a tubulin dependent manner, given the co-localization of PHE-gpl 
phosphorylation and binding to tubulin, such a possibility is likely. Work isjfi progress to 
determine if phosphorylated hexapeptides bind to tubulin using the assay described herein. Thus, 
the present invention opens the door to the validation of a physiologically relevant interaction 
between proteinaceous domains. 

The possibility that the 57 kDa protein binds to the polypropylene rods or their 
derivatized moieties is unlikely since all other rods which are similarly derivatized did not bind 
the 57 kDa protein. Moreover, hexapeptides synthesized on at least four different times bound to 
the same proteins. Finally, hexapeptides encoding the first and third high affinity binding regions 
of the linker domains of PHE-gpl and PJBE-gp3 bound to the 57 kDa protein. In addition to the 
57 kDa protein, other proteins with apparent molecular masses of -80 kDa and 30 kDa also 
bound to some of the hexapeptides in the linker domains. However, the binding of these proteins 
was much weaker than the 57 kDa and maybe associated proteins. Although direct 
measurements of binding affinities between the various hexapeptides and the 57 kDa protein 
have not been done, it is interesting that this interaction is resistant to 10 mM CHAPS and high 
salt. Moreover, the presence of 10 mM CHAPS in the incubation mix lead to the binding of 
other proteins (most notably the -210 kDa protein) to several stretches of hexapeptides which did 
not bind in the absence of 10 mM CHAPS. The binding of the latter proteins to the hexapeptides 
15 - 28 are likely due to the extraction of proteins from the membranous material which were 
excluded in the absence of CHAPS. In absence of CHAPS, the cell lysate contained soluble 
proteins and membrane associated proteins only. 

The physiological significance of PHE-gpl or PH£-gp3 binding to tubulin is not 
clear. However, tubulin has been shown to interact with several membrane proteins (3842, 350. 
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SI, §&6r-&7). -PHE-gpl or PHE-gp3 interactions with tubulin and possibly microtubules maybe 
an example of the membrane-skeleton fence model (43f£). In this model, a small fraction of 
membrane receptors seem to be fixed to the underlying cytoskeleton (6425). It is interesting in 
this respect that increase in the stability and expression of P-gp in rate liver tumors in vivo are 
associated with similar increases in the stability of several cytoskeleton proteins, including a- 
tubulin, (3-actin, and cytokeratins 8/18 (£48). Work is in progress to determine the functional 
significance of P-gp interactions with tubulin in vivo. 

EXAMPLE 11 

The eOverlapping ^Peptides sSpanning mMethod-is- is not ILimited-te- to Pgp— 

^Interacting pProteins 

The overlapping peptide approach of the present invention has been further validated with 
Annexin I, a soluble and membrane associated protein, as opposed to P-glycoprotein, a strictly 
transmembrane protein. Annexin is thus structurally and functionally different from P- 
glycoprotein. 

Using this approach, several proteins that interact with Annexin 1* and the precise amino 
acid sequences of Annexin I 7 which mediate these interactions^ were identified. Annexin I is a 
member of a large family of intracellular soluble and membrane associated proteins that bind 
phospholipids in a reversible and calcium-dependent manner. Various members of the Annexin 
family have been implicated in a number of different intracellular processes including vesicular 
trafficking, membrane fusion exocytosis, signal transduction, and ion channel formation and drug 
resistance. Given the many possible physiological functions of Annexin I, the method of the 
present invention was set out to identify its interacting proteins and the precise amino acid 
sequences that mediate Annexin I interactions thereto. 

Briefly, as described earlier, overlapping peptides corresponding to the entire 
amino acid sequence of Annexin I (total of -340 peptides plus controls) were synthesized on a 
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solid support as described above. In this case, overlapping heptapeptides, as opposed to 
hexapeptides were used. The peptides were then incubated with total cellular proteins isolated 
from MCF7 breast tumor cells that were metabolically labeled with [ 35 S] methionine. Following 
several washes, the bound proteins were eluted and resolved on SDS-PAGE as outlined above. 
The results are consistent with previous results with P-glycoprotein, as the method leads to the 
identification of several islands of Annexin I amino acid sequences (data not shown) which 
interacted with five proteins ranging in molecular masses from 10 kDa to 200 kDa (specifically, 
-10 kDa; -29 kDa; -85 kDa; -106 kDa and -200 kDa). Briefly, Seight interacting domains 
having high affinity for the cellular proteins of the extract were identified. Two of these high- 
affinity islands were located in the tail domain of Annexin (residues 1-36) and ©six in the a 
helical bundles of Annexin I (residues 37-to the end; see for example WO 99/21980). The 
identity of the latter interacting proteins is presently under study. However, the interaction of a 10 
kDa protein with Annexin I is consistent with earlier works which demonstrated a direct 
interaction between Annexin I and S100C protein ( Ma i lliard, W.S. Ha i g l or, H.T., and Soh l aopfor, 
D.D. 1006, J. B i ol. Chom., 271; 710 725Z Q). 

Thus, the present invention is shown to enable the simple and efficient 
identification of high affinity protein interaction as well as enabling the simultaneous 
identification of the precise amino acid sequence of at least one of the interacting partners. 

CONCLUSIONS 

In conclusion, a simple approach to identify P-gp interacting proteins from a total 
cell lysate has been used. Moreover, this approach allows for the identification of the precise 
amino acid sequences in P-gpl and P-gp3 linker domains that mediate the protein interactions 
with tubulins. In addition, knowledge of the high- affinity binding sequences allow for the 
subsequent purification of the interacting proteins from a total mixture of cellular proteins, as 
further exemplified with Annexin I. Indeed, given the simplicity of this approach to study 
protein-protein interactions, it is easily applied to other proteins. Finally, our approach is rapid 
and has several advantages over other currently used approaches. 
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Although the present invention has been described hereinabove by way of preferred 
embodiments thereof, it can be modified, without departing from the spirit and nature of the 
subject invention as defined in the appended claims. 
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SEQUENCE L l ST I NG: WHAT IS CLAIMED IS: 

A method ef-fotjdentifying a h i gh aff i n i ty i ntoract i ng doma i n polypeptide that binds to a 
peptide i n a chosen protein, domain thoroof or part thoroof and th e am i no ac i d soquonc e 
itersofwherein said polypeptide is not an antibody, comprising: 

a 

£a) providing a set of overlapping peptides spanning a complete sequence of sa i d chocon 
protoin, doma i n thoroof or part th e r e of, oova l ont l y bound to a support; 

b) prov i ding a m i xtur e of proto i ns and/or a m i xtur e of p e ptid e s; 

c^ i ncubat i ng ca i d at least a domain of the chosen protein, the set of overlapping 

peptides of al w i th sa i d m i xturo of bV heing attached to a su pport: 

Ibl contacting the support with a mixture of polypeptides under conditions enabling 
tho b i nd i ng botwoon a h i gh - aff i n i ty intoract i ng domain i n a poptid e of sa i d cot and ono or mor e 
proto i n or p e pt i d e of b) to occur; 

d) wash i ng of any prot ei n - prot ei n intoract i on which is not a h i gh - aff i n i ty i nteract i on of c); and 

o) i dent i fying wh i ch p e pt i d e of a) i ntoracts w i th high - aff i n i ty to a proto i n or popt i do of b), 

thoroby i dentify i ng sa i d p e pt i d e of o) and tho soquonco thoroof as a h i gh aff i nity intoracting 
domain. 

St Tho mothod of cla i m 1, whoro i n sa i d mixturo of proto i ns and/or mixturo 
of popt i dos contains a l ab e l. 

St Tho mothod of c l a i m 1 or 2, whoro i n sa i d sot of 
binding between the swpport flnd a polypeptide of the mixture; 

l£l washing the support to remove unbound polypeptides of the mixture: and 

Ml identifying the polvnentide that hinds to the support: 
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wherein a polypeptide that hinds to the support is the polypeptide that hinds to the 
peptide in the chosen protein. 



11. The method of claim 10. wherein the nolvnentide that hinds to the nentide in the chosen 
protein hinds to a high affinity domain of the chosen protein. 



12. The method of claim 10. wherein the suppr 
chip, head, and plate. 



rom the group consisting of a 



13. The method of claim 10. wherein the set of support-attached overlapping peptides is 
synthesized synthetically using the ooquonoo of said ohocon proto i n. 



4? Tho mothod of cla i m 1, 2 or 3, wherein sa i d support i s chos e n from a 
ch i p, a b e ad, or a p l ato. amino acid sequence of the chosen protein. 

14» The method of claim 10. wherein each of the peptides in the set of 

support-attached overlapping peptides is from ahout S amino acids to ahont IS amino acids 
in length. 

15* The method of claim 10. wherein each of the peptides in the set of 

support-attached overlapping peptides is from ahont 5 amino acids to ahont 12 amino acids 
in leng th. 

1L The method of claim 10. wherein each of the nentides in the set of 

support-attached overlapping peptides is from ahont g amino acids to ahont 10 amino acids 
in length. 



XL 



in length. 



1& 



The method of claim 10. wherein each of the peptides in the set of 
werlapping peptides is from ahout S amino acids to ahout 7 amino acids 



The method of claim 10. wherein the set of overlapping neptides is 
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UL_ The method of claim 10. wherein the support is contacted with a lvsate 

from a cell, wherein the lvsate comprises the mixture of polypeptides. 

2SL The method of claim 10. wherein the chosen protein is human P. 

glycoprotein 1. 

2L The method of claim 20. wherein the domain is selected from the group 

consisting of a first domain consisting of the amino acid sequence of SEP TP NO: 1. a 
second do main consisting of the amino acid sequence of SEQ TP NO: 2. a third domain 
consisting of the amino acid sequence of SEP IP NO: 3, and a combination of the first, 
second, and third domains. 

22* The method of claim 20, wherein the set of overlapping peptides 

comprises a first peptide consisting of an amino acid sequence of SEQ TP NP: 7 and a 
second peptide consisting of an amino acid sequence of SEQ TP NP: 8. 

23» The method of claim 20, wherein the polypeptide is tubulin, 

24. The method of claim 10. wherein the chosen protein is human P- 
glvcoprotein 3. 

25, The method of claim 24, wherein the domain is selected from the group 
consisting of a first domain consisting of the amino acid sequence of SEQ TP NP: 4. a 
second domain consisting of the amino acid sequence of SEQ IP NO; 5, a third domain 
consisting of the amino acid sequence of SEP TP NP: 6. and a combination of the first- 
second, and third domains. 

2fu A method for identifying a peptide in a chosen protein that hinds to a 

polypeptide, wherein said polypeptide is not an antihodv. the method comprising: 

£a} providing a set of overlapping peptides snanning a complete 
sequence of at least a domain of the chosen nrotein. the set of overlapping nentides being 
attached to a support: 
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binding 



Ibl contacting the support with a polypeptide under conditions enabling 
between the support and the nnlvpeptidet 

washing the support to remove nnhound polypeptide: and 



lpnort that hinds to the polypeptide. 




ip the pey 
bin a high affinity domain of the 

2S» The method of claim 26. wherein the support is contacted with the mixture of 
polvnentides under conditions enabling hinding hetweep the support and the polypeptide 
of the mixture. 



29. The method of claim 26. wherein the support is selected from the group consisting of a 
chip, head, and plate. 

30. The method of claim 26. wherein the set of support-attached overlapping peptides of the 
support is synthesized synthetically using the amino acid sequence of the chosen protein. 

31. The method of claim 26. wherein each of the peptides in the set of support-attached 
overlapping peptides is from about 5 amino acids to about 15 amino acids in length. 

32. The method of claim 26. wherein each of the peptides in the set of support-attached 
overlappjpg peptides is from about S amino acids to about 12 amino acids in length. 



33. The method of claim 26. wherein each of the peptides in the set of support-attached 
overlapping nentides is from about § amino acids to about 10 amino acids in length. 

34. The method of claim 26. wherein each of the peptides in the set of support-attached 
tides is from about g amino acids to ahout 7 amino acids in length. 
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35. The method of claim 26. wherein the set of overlapping peptides is covalentlv attached 
to the support. 

3&The method of claim 26, wherein the chosen protein is human P-glycoprotein 1, 

3L The method of claim 36, wherein the domain is selected from the group consisting of a 
first domain consisting of the amino acid sequence of SEQ ID NO; 1, a second domain 
consisting of the amino acid sequence of SEP ID NO: 2. a third domain consisting of the 

amino aciq sequence of SEQ IP NQ; 3, and a combination of the first, second, and third 
domains, 

38. The method of claim 36. wherein the set of overlapping nentides comprises a first 
peptide consisting of an amino acid sequence oLSEQ ID NO;7 and a second peptide 
consisting of an amino acid sequence of SEQ ID NO;8, 

39. The method of claim 36. wherein the polypeptide is tubulin. 

40. The method of claim 26. wherein the chosen protein is human P- glycoprotein 3. 

The method of claim 4Q t wherein the domain is selected from the group consisting of a 
first domain consisting of the amino acid sequence of SEQ ID NO: 4. a second domain 
consisting of the amino acid sequence of vSEQ ID NO: 5. a third domain consisting of the 
amino acid sequence of SEP TP NO: 6. and a combination of the first, second, and third 
domains. 

42, A method of identifying a compound that modulates the binding of a polypeptide to a 
peptide in a chosen protein, wherein said polypeptide is not an antibody, comprising: 

St A mothod of id e nt i fy i ng an agont wh i ch modu l ates an i ntoraction 
betw ee n h i gh aff i n i ty interacting doma i ns b e tw ee n £a) providing a set of overlapping 
peptides spanning a complete sequence of a chosen prot ei n, doma i n thor e of or part thoroof, 
cova l ont l y bound to a support and a m i xturo of proto i ns and/or a m i xtur e of poptidos 
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oompr i o i ngiat least a domain of the chosen protein, the set of overlapping peptides heinp 
attached to a support; 

a) i ncubating ca i d cot of ovor l apping poptidoc, w i th oaid mixturo in a 
pr e s e nc e of at l e ast ono agont, undor cond i t i ons onabling tho b i nd i ng botwoon a h i gh - aff i n i ty 
i nteract i ng doma i n i n a popt i do of ca i d cot and one or moro proto i n or popt i d e of sa i d m i xturo to 
eeew £h} contacting the support with a candidate compound and the 

Polypeptide under conditions enabling hinding hetween the support and the 

polypeptide: 

b) wash i ng of any proto i n - protoin i ntoraot i on wh i ch is not a h i gh aff i nity i ntoraot i on of b); and 

c) i dont i fy i ng wh i ch popt i do of a) int e racts with h i gh aff i n i ty to a protoin or popt i do of sa i d 
m i xturo i n a prosonco of ca i d ag e nt as comparod to i n an abconce thoroof, 

th e r e by i dont i fy i ng sa i d agont as a modulator of sa i d h i gh - aff i n i ty i nt e ract i on whon sa i d 
i ntoraotion i n tho pros e nc e of sa i d agont is m e asurably d i fforont from in tho absonco thoroof. 

€ 

{£} washing the support to remove unhound polypeptides of the mixture; 

and 

(d) detecting hinding of the polypeptide to the support: 

wherein a change in the binding of the polypeptide to the support in the 
presence of the candidate compound compared to the binding of the polypeptide to the 
su pport in the absence of the candidate compound identifies the candidate compound as a 
compound that modulates binding of the polypeptide to the peptide in the chosen protein. 

43 . Tho mothod of c l aim 5, whoro i n sa i d m i xturo of protoinc and/or m i xturo of popt i dos contains 
a labol. The method of claim 42. wherein the domain of the chosen protein is a high 
affinity domain of the chosen protein. 

44. The method of claim 42, wherein the polypeptide is known to hind to the chosen 
protein, 
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45. The method of claim 42. wherein the support is selected from the Proup consistinp of a 

chip, tread, and plate. 

The method of claim 5 or 6 42, wherein saidthg set of su pport-attached overlapping 
peptides of the support is synthesized synthetically using the amino acid sequence of ea&lhe, 
chosen protein. 

4L The method, of claim 42, wherein each of the peptides in the set of support-attached 
overlapping peptides is from about 5 amino acids to about 15 amino acids in length, 

48. The method of claim 42. wherein each of the peptides in the set of support-attached 
overlappinP peptides is from about 5 amino acids to about 1 2 amino acids in length. 

49. The method of claim 42. wherein each of the peptides in the set of support-attached 
overlappinP peptides is from about 5 amino acids to ahout 10 amino acids in length. 

50. The method of claim 42. wherein each of the peptides in the set of support-attached 
overlapping peptides is from ahout 5 amino acids to about 7 amino acids in length. 

51. The method of claim 42. wherein the set of overlapping peptides is covalentlv attached 
to the support. 

5^ The method of claim 42, wherein the chosen protein is human P-glycoprotein L 

53. The method of claim 52. wherein the domain is selected from the group consisting of a 
first domain consisting of the amino acid sequence of SEQ TP NO: 1. a second domain 
consisting of the amino acid sequence of SEQ IP NO: 2. a third domain consisting of the 
amino acid sequence of SEQ TP NO: 3 and a combination of the first, second, and third 
domains. 
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54. The method of claim 52. wherein the set of overmanning peptides comprises a first 
pentide consisting of an amino acid sennence of SEP ID NOt 7 and a second nentide 
consisting of an amino acid sequence of SEP TP NO: 8. 

5L The method of claim 52, wherein the polypeptide is tubulin, 

giu The method of claim 42, wherein the chosen protein is human P-glycoprptein 3. 

57. The method of claim 56. wherein the domain is selected from the group consisting of a 
first domain consisting of the amino acid sequence of SEP ID NP! 4. a second domain 
consisting of the amino acid sequence of SEP ID NP: 5. a third domain consisting of the 
amino acid sequence of SEP ID NP: 6. and a combination of the first, second, and third 
domains. 

58. A support to which is attached a set of overlapping peptides spanning a complete 

sequence of at least a domain of a protein, 

59. The support of claim 58. wherein the domain of the nrotein is a high affinity domain of 
the protein. 

60. The support of claim 58. wherein set of overlapping peptides spans the comnlete 
sequence of the entire protein. 

61 . The support of claim 58. wherein the support is selected from the group consisting of a 
chip, head, and plate. 

62. The support of claim 58. wherein the set of support-attached overlapp in g peptides of 
the support is synthesized synthetically using the amino acid sequence of the chosen 
protein, 

63. The support of claim 58. wherein each of the peptides in the set of support-attached 
overlapping peptides is from about 5 amino acids to ahont 15 amino acids in length. 
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64. The sunnort of claim 58. wherein each of the nentides in the set of support-attached 
overlapp in g peptides is from about 5 amino acids to ahout 12 amino acids in length. 

65. The support of claim 58. wherein each of the nentides in the set of snpport-attached 
overlanning nentides is from ahont 5 amino acids to about 10 amino acids in length. 

66. The sunnort of claim 58. wherein each of the peptides in the set of support-attached 
overlapping peptides is from about 5 amino acids to about 7 amino acids in length, 

£L The support of claim 5ft, wherein the set of overlapping peptides is coyalently attached 
to the support. 

&L The support of. claim 58. wherein a polypeptide that hinds tu a peptide attached to the 

support is identified as a polypeptide that hinds to the nrotein. 

&L The support of claim 58, wherein the chosen protein is human P-glycoprotein 1, 

70. The support of claim 69. wherein the domain is selected from the grou p consisting of a 
first domain consisting of the amino acid sequence of SFO ID NO: 1. a second domain 
consisting of the amino acid sequence of SFO ID NO; 2. a third domain consisting of the 
amino acid sequence of SFO TP NO; 3. and a combination of the first, second, and third 
domains. 

ZL The support of claim 69, wherein the set of overlapping peptides comprises a first 
peptide consisting of an amino acid sequence of SFO ID NO;7 and a second peptide 
consisting of an amino acid sequence of SFO ID NO; 8. 

72. The support of claim 58. wherein the chosen nrotein is human P-glvconrotein 3. 

73. The support of claim 72. wherein the domain is selected from the gronn consisting of a 
first domain consisting of the amino acid senuence of SFO ID NO; 4. a second domain 
consisting of the amino acid sequence of SFO ID NO; 5. a third domain consisting of the 
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amino acid sequence of SEP ID NO; 6. and a combination of the first, second, and third 
domains. 

Z4» ^ method for purifying tulmlin comprising; 

a^ contactinp a sample containing tubulin with a support to which is attached a first 
peptide consisting of an amino acid sequence of RSSTJR and a second pentide 
consisting of an amino acid sequence of SVRGSO. wherein the contacting is under 
conditions enabling hinding between the sunnort and the tubulin in the same: 

h^ rinsing the samnle-contacted sunnort to remove unbound molecules in said sample: 
and 

c) eluting said tubulin hound to said support; 

wherein said tubulin eluted from said support i s purified. 

87 Tho method of c l aim 5, 6 or 7, whoroin sa i d support is choson from a 
ch i p, a boad, or a p l ato. 
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9.A modu l ator of high affinity i ntoraotion i dontif i od by any ono of sa i d mothod of c l aims 5 - 9. 
ABSTRACT OF THE DISCLOSURE 

The i nvention relates to protein-protein interactions and methods for identifying 
interacting proteins and the amino acid sequence at the site of interaction. Using overlapping 
hexapeptides that encode for the entire amino acid sequences of the linker domains of human P- 
glycoprotein gene 1 and 3 (HP-gpl and HP-gp3), a direct and specific binding between PHE-gpl 
and 3 linker domains and intracellular proteins was demonstrated. Three different stretches 
( 617 EKGIYFKLVTM 627 , 658 SRSSLIRKRSTRRSVRGSQA 677 and 694 PVSFWRIMKLNLT 706 for 
PHP-gpl and 618 LMKKEGVYFKLVNM 631 , 648 KAATRMAPNGWKSRLFRHSTQKNLKNS 674 
and 695 PVSFLKVLKLNKT 67 £ 7 for PHE-gp3) in linker domains bound to proteins with apparent 
molecular masses of -80 kDa, 57 kDa and 30 kDa. The binding of the 57 kDa protein was 
further characterized. Purification and partial N-terminal amino acid sequencing of the 57 kDa 
protein showed that it encodes the N-terminal amino acids of alpha and beta-tubulins. The 
method of the present invention was further validated with Annexin. The present invention thus 
demonstrates a novel concept whereby the interactions between two proteins are mediated by 
strings of few amino acids with high and repulsive binding energies, enabling the identification 
of high- affinity binding sites between any interacting proteins. 
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